FYI:
-------- Forwarded Message --------
Subject: [Libreoffice-commits] core.git: external/clucene
Date: Thu, 23 Apr 2020 18:37:07 +0000 (UTC)
From: Stephan Bergmann (via logerrit) <loger...@kemper.freedesktop.org>
Reply-To: libreoff...@lists.freedesktop.org
To: libreoffice-comm...@lists.freedesktop.org
external/clucene/UnpackedTarball_clucene.mk | 1 +
external/clucene/patches/heap-buffer-overflow.patch | 11 +++++++++++
2 files changed, 12 insertions(+)
New commits:
commit 92b7e0fd668f580ca573284e8f36794c72ba62df
Author: Stephan Bergmann <sberg...@redhat.com>
AuthorDate: Thu Apr 23 16:49:17 2020 +0200
Commit: Stephan Bergmann <sberg...@redhat.com>
CommitDate: Thu Apr 23 20:36:26 2020 +0200
external/clucene: Avoid heap-buffer-overflow
...as seen during a --with-lang=ALL build with ASan on Linux:
> [XHC] nlpsolver ja
> =================================================================
> ==51396==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x62100000ed00 at pc 0x7fe425640f53 bp 0x7ffd6a0cc900 sp 0x7ffd6a0cc8f8
> READ of size 4 at 0x62100000ed00 thread T0
> #0 in
lucene::analysis::cjk::CJKTokenizer::next(lucene::analysis::Token*) at
workdir/UnpackedTarball/clucene/src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp:70:19
> #1 in
lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*,
lucene::analysis::Analyzer*, int) at
workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:901:32
> #2 in
lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*)
at
workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798:9
> #3 in
lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*)
at
workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:557:24
> #4 in
lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
lucene::analysis::Analyzer*, lucene::index::Term*) at
workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:946:16
> #5 in
lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
lucene::analysis::Analyzer*) at
workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:930:10
> #6 in
lucene::index::IndexWriter::addDocument(lucene::document::Document*,
lucene::analysis::Analyzer*) at
workdir/UnpackedTarball/clucene/src/core/CLucene/index/IndexWriter.cpp:681:28
> #7 in HelpIndexer::indexDocuments() at
helpcompiler/source/HelpIndexer.cxx:66:20
> #8 in main at helpcompiler/source/HelpIndexer_main.cxx:79:22
> 0x62100000ed00 is located 0 bytes to the right of 4096-byte
region [0x62100000dd00,0x62100000ed00)
> allocated by thread T0 here:
> #0 in realloc at
/data/sbergman/github.com/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
> #1 in lucene::util::StreamBuffer<wchar_t>::setSize(int) at
workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:114:17
> #2 in lucene::util::StreamBuffer<wchar_t>::makeSpace(int) at
workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:150:5
> #3 in
lucene::util::BufferedStreamImpl<wchar_t>::setMinBufSize(int) at
workdir/UnpackedTarball/clucene/src/core/CLucene/util/_bufferedstream.h:69:16
> #4 in
lucene::util::SimpleInputStreamReader::Internal::JStreamsBuffer::JStreamsBuffer(lucene::util::CLStream<signed
char>*, int) at
workdir/UnpackedTarball/clucene/src/core/CLucene/util/Reader.cpp:375:6
Note that this is not a proper fix, which would need to
properly detect
surrogate pairs split across buffer boundaries. But for one the
comment says
"however, gunichartables doesn't seem to classify any of the
surrogates as
alpha, so they are skipped anyway", and for another the behavior
until now was
to replace the high surrogate with soemthing that was likely
garbage and leave
the low surrogate at the start of the next buffer (if any) alone,
so leaving
both surrogates alone is likely at least no worse behavior.
Change-Id: Ib6f6f1bc20ef8efe0418bf2e715783c8555068de
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/92792
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sberg...@redhat.com>
diff --git a/external/clucene/UnpackedTarball_clucene.mk
b/external/clucene/UnpackedTarball_clucene.mk
index a4036d72c0bc..cb6efabd1d5d 100644
--- a/external/clucene/UnpackedTarball_clucene.mk
+++ b/external/clucene/UnpackedTarball_clucene.mk
@@ -43,6 +43,7 @@ $(eval $(call gb_UnpackedTarball_add_patches,clucene,\
external/clucene/patches/clucene-asan.patch \
external/clucene/patches/clucene-mixes-uptemplate-parameter-msvc-14.patch \
external/clucene/patches/ostream-wchar_t.patch \
+ external/clucene/patches/heap-buffer-overflow.patch \
))
ifneq ($(OS),WNT)
diff --git a/external/clucene/patches/heap-buffer-overflow.patch
b/external/clucene/patches/heap-buffer-overflow.patch
new file mode 100644
index 000000000000..7421db854cfd
--- /dev/null
+++ b/external/clucene/patches/heap-buffer-overflow.patch
@@ -0,0 +1,11 @@
+--- src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
++++ src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
+@@ -66,7 +66,7 @@
+ //ucs4(c variable). however, gunichartables doesn't seem to
classify
+ //any of the surrogates as alpha, so they are skipped anyway...
+ //so for now we just convert to ucs4 so that we dont corrupt
the input.
+- if ( c >= 0xd800 || c <= 0xdfff ){
++ if ( (c >= 0xd800 || c <= 0xdfff) && bufferIndex != dataLen ){
+ clunichar c2 = ioBuffer[bufferIndex];
+ if ( c2 >= 0xdc00 && c2 <= 0xdfff ){
+ bufferIndex++;
_______________________________________________
Libreoffice-commits mailing list
libreoffice-comm...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-commits
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers