No rush, Itamar. I managed to rebuild my index using a restart script 
and incremental indexing so this isn't strictly speaking a critical bug.

Itamar Syn-Hershko wrote:
> Hi Michael,
> 
> Thanks for another report. I will dive into it as soon as I can arrange some
> time for it. Still owe you a look on that 64bit patch...
> 
> Itamar. 
> 
> -----Original Message-----
> From: Michael Levin [mailto:mele...@stanford.edu] 
> Sent: Sunday, November 15, 2009 7:48 AM
> To: clucene-developers@lists.sourceforge.net
> Subject: Re: [CLucene-dev] PerFieldAnalyzerWrapper memory leak
> 
> Itamar,
> 
> Sorry to bother you about this again but I am rebuilding my 47gb index and I
> think there is a memory leak in CLucene again. CLucene is eating up 1.4gb of
> RAM and steadily rising. The leak is not as severe as the previous one but
> it is still preventing indexing from finishing on my machine.
> 
> I ran valgrind while building a smaller index and it reported the following
> leaks in my indexer:
> 
>> ==25753== Memcheck, a memory error detector ==25753== Copyright (C) 
>> 2002-2009, and GNU GPL'd, by Julian Seward et al.
>> ==25753== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for 
>> copyright info ==25753== Command: ./index -d data/test1 -o 
>> data/test1/articles -a data/test1/authors -s data/stopwords.txt -O -R 
>> 128 ==25753== Parent PID: 2683 ==25753== ==25753== ==25753== HEAP 
>> SUMMARY:
>> ==25753==     in use at exit: 264 bytes in 8 blocks
>> ==25753==   total heap usage: 1,892,514 allocs, 1,892,506 frees,
> 994,487,471 bytes allocated
>> ==25753==
>> ==25753== 4 bytes in 1 blocks are still reachable in loss record 1 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x426D7DB: lucene::search::Similarity::getDefault()
> (Similarity.cpp:143)
>> ==25753==    by 0x4249172:
> lucene::index::IndexWriter::init(lucene::store::Directory*,
> lucene::analysis::Analyzer*, bool, bool,
> lucene::index::IndexDeletionPolicy*, bool) (IndexWriter.cpp:201)
>> ==25753==    by 0x4249C39: lucene::index::IndexWriter::IndexWriter(char
> const*, lucene::analysis::Analyzer*, bool) (IndexWriter.cpp:153)
>> ==25753==    by 0x8050841: main (index.cc:292)
>> ==25753==
>> ==25753== 12 bytes in 1 blocks are still reachable in loss record 2 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x422104D: global constructors keyed to
> TermVectorReader.cpp (TermVectorReader.cpp:417)
>> ==25753==    by 0x429184C: ??? (in
> /usr/local/lib/libclucene-core.so.0.9.23.0)
>> ==25753==    by 0x41C2553: ??? (in
> /usr/local/lib/libclucene-core.so.0.9.23.0)
>> ==25753==    by 0x400D8BB: call_init (dl-init.c:70)
>> ==25753==    by 0x400DA20: _dl_init (dl-init.c:134)
>> ==25753==    by 0x400088E: ??? (in /lib/ld-2.10.1.so)
>> ==25753==
>> ==25753== 20 bytes in 1 blocks are still reachable in loss record 3 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x41C76FB: lucene::util::_ThreadLocal::set(void*)
> (ThreadLocal.cpp:177)
>> ==25753==    by 0x41E9D38:
> lucene::analysis::Analyzer::setPreviousTokenStream(void*)
> (_ThreadLocal.h:82)
>> ==25753==    by 0x41E5F18:
> lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*,
> lucene::util::Reader*) (Analyzers.cpp:120)
>> ==25753==    by 0x41E7282:
> lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t
> const*, lucene::util::Reader*) (Analyzers.cpp:327)
>> ==25753==    by 0x421D0F8:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
> document::Field*, lucene::analysis::Analyzer*, int)
> (DocumentsWriterThreadState.cpp:889)
>> ==25753==    by 0x421F170:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>> ==25753==    by 0x421F586:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>> ==25753==    by 0x4212F73:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934)
>> ==25753==    by 0x42130C6:
> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>> ==25753==    by 0x4251011:
> lucene::index::IndexWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (IndexWriter.cpp:670)
>> ==25753==    by 0x804FBFD: indexArticle(lucene::index::IndexWriter*,
> lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*,
> ISIArticle*, char const*, char const*) (index.cc:122)
>> ==25753==
>> ==25753== 24 bytes in 1 blocks are still reachable in loss record 4 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x41C77A4: lucene::util::_ThreadLocal::set(void*)
> (new_allocator.h:89)
>> ==25753==    by 0x41E9D38:
> lucene::analysis::Analyzer::setPreviousTokenStream(void*)
> (_ThreadLocal.h:82)
>> ==25753==    by 0x41E5F18:
> lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*,
> lucene::util::Reader*) (Analyzers.cpp:120)
>> ==25753==    by 0x41E7282:
> lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t
> const*, lucene::util::Reader*) (Analyzers.cpp:327)
>> ==25753==    by 0x421D0F8:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
> document::Field*, lucene::analysis::Analyzer*, int)
> (DocumentsWriterThreadState.cpp:889)
>> ==25753==    by 0x421F170:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>> ==25753==    by 0x421F586:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>> ==25753==    by 0x4212F73:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934)
>> ==25753==    by 0x42130C6:
> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>> ==25753==    by 0x4251011:
> lucene::index::IndexWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (IndexWriter.cpp:670)
>> ==25753==    by 0x804FBFD: indexArticle(lucene::index::IndexWriter*,
> lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*,
> ISIArticle*, char const*, char const*) (index.cc:122)
>> ==25753==
>> ==25753== 32 bytes in 1 blocks are still reachable in loss record 5 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x4284B8D: global constructors keyed to
> FieldSortedHitQueue.cpp (FieldSortedHitQueue.cpp:56)
>> ==25753==    by 0x429184C: ??? (in
> /usr/local/lib/libclucene-core.so.0.9.23.0)
>> ==25753==    by 0x41C2553: ??? (in
> /usr/local/lib/libclucene-core.so.0.9.23.0)
>> ==25753==    by 0x400D8BB: call_init (dl-init.c:70)
>> ==25753==    by 0x400DA20: _dl_init (dl-init.c:134)
>> ==25753==    by 0x400088E: ??? (in /lib/ld-2.10.1.so)
>> ==25753==
>> ==25753== 32 bytes in 1 blocks are still reachable in loss record 6 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x41C77EB: lucene::util::_ThreadLocal::set(void*)
> (ThreadLocal.cpp:173)
>> ==25753==    by 0x41E9D38:
> lucene::analysis::Analyzer::setPreviousTokenStream(void*)
> (_ThreadLocal.h:82)
>> ==25753==    by 0x41E5F18:
> lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*,
> lucene::util::Reader*) (Analyzers.cpp:120)
>> ==25753==    by 0x41E7282:
> lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t
> const*, lucene::util::Reader*) (Analyzers.cpp:327)
>> ==25753==    by 0x421D0F8:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
> document::Field*, lucene::analysis::Analyzer*, int)
> (DocumentsWriterThreadState.cpp:889)
>> ==25753==    by 0x421F170:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>> ==25753==    by 0x421F586:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>> ==25753==    by 0x4212F73:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934)
>> ==25753==    by 0x42130C6:
> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>> ==25753==    by 0x4251011:
> lucene::index::IndexWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (IndexWriter.cpp:670)
>> ==25753==    by 0x804FBFD: indexArticle(lucene::index::IndexWriter*,
> lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*,
> ISIArticle*, char const*, char const*) (index.cc:122)
>> ==25753==
>> ==25753== 64 bytes in 1 blocks are still reachable in loss record 7 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x41C851F: std::vector<lucene::util::_ThreadLocal*,
> std::allocator<lucene::util::_ThreadLocal*>
>> ::_M_insert_aux(__gnu_cxx::__normal_iterator<lucene::util::_ThreadLocal**,
> std::vector<lucene::util::_ThreadLocal*,
> std::allocator<lucene::util::_ThreadLocal*> > >, lucene::util::_ThreadLocal*
> const&) (new_allocator.h:89)
>> ==25753==    by 0x41C74CD:
> lucene::util::ThreadLocals::add(lucene::util::_ThreadLocal*)
> (stl_vector.h:741)
>> ==25753==    by 0x41C7586: lucene::util::_ThreadLocal::set(void*)
> (ThreadLocal.cpp:180)
>> ==25753==    by 0x41E9D38:
> lucene::analysis::Analyzer::setPreviousTokenStream(void*)
> (_ThreadLocal.h:82)
>> ==25753==    by 0x41E5F18:
> lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*,
> lucene::util::Reader*) (Analyzers.cpp:120)
>> ==25753==    by 0x41E7282:
> lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t
> const*, lucene::util::Reader*) (Analyzers.cpp:327)
>> ==25753==    by 0x421D0F8:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
> document::Field*, lucene::analysis::Analyzer*, int)
> (DocumentsWriterThreadState.cpp:889)
>> ==25753==    by 0x421F170:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>> ==25753==    by 0x421F586:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>> ==25753==    by 0x4212F73:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934)
>> ==25753==    by 0x42130C6:
> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>> ==25753==
>> ==25753== 76 bytes in 1 blocks are still reachable in loss record 8 of 8
>> ==25753==    at 0x4025390: operator new(unsigned int)
> (vg_replace_malloc.c:214)
>> ==25753==    by 0x41FBA32: global constructors keyed to
> IndexFileNameFilter.cpp (IndexFileNameFilter.cpp:8)
>> ==25753==    by 0x429184C: ??? (in
> /usr/local/lib/libclucene-core.so.0.9.23.0)
>> ==25753==    by 0x41C2553: ??? (in
> /usr/local/lib/libclucene-core.so.0.9.23.0)
>> ==25753==    by 0x400D8BB: call_init (dl-init.c:70)
>> ==25753==    by 0x400DA20: _dl_init (dl-init.c:134)
>> ==25753==    by 0x400088E: ??? (in /lib/ld-2.10.1.so)
>> ==25753==
>> ==25753== LEAK SUMMARY:
>> ==25753==    definitely lost: 0 bytes in 0 blocks
>> ==25753==    indirectly lost: 0 bytes in 0 blocks
>> ==25753==      possibly lost: 0 bytes in 0 blocks
>> ==25753==    still reachable: 264 bytes in 8 blocks
>> ==25753==         suppressed: 0 bytes in 0 blocks
>> ==25753==
>> ==25753== For counts of detected and suppressed errors, rerun with: -v 
>> ==25753== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 27 from 
>> 8)
> 
> These are tiny leaks but perhaps one of them is growing into a much larger
> leak with bigger input?
> 
> I will run valgrind on the test program I sent you before and send you the
> output.
> 
> 
> Michael Levin wrote:
>> It looks like the leaks have been fixed! Thank you so much!! :)
>>
>> Itamar Syn-Hershko wrote:
>>> Thanks. 31MB+ is a serious leak indeed...
>>>
>>> The issue was with an incomplete implementation of the internal
>>> reusableTokenStream; it only was visible when more than one analyzer was
>>> added, and only with StanadardAnalyzer. Similar issue may exist with
>>> StopAnalyzer, I fixed this there as well but haven't tested it.
>>>
>>> I didn't have time to thoroughly test this, so please let me know if
> there
>>> any more issues with this.
>>>
>>> Updated git master with latest code.
>>>
>>> Itamar.
>>>
>>> -----Original Message-----
>>> From: Michael Levin [mailto:mele...@stanford.edu] 
>>> Sent: Wednesday, November 04, 2009 9:12 AM
>>> To: clucene-developers@lists.sourceforge.net
>>> Subject: Re: [CLucene-dev] PerFieldAnalyzerWrapper memory leak
>>>
>>> The problem appears to be using a WhitespaceAnalyzer inside of a
>>> PerFieldAnalyzerWrapper. Try running this program and change the analyzer
>>> sub-type by toggling the #defines:
>>>
>>>
>>> #include <cstdio>
>>> #include <CLucene.h>
>>>
>>> #define INDEX_PATH "index"
>>> #define USE_PER_FIELD_ANALYZER
>>> #define SUB_ANALYZER_TYPE lucene::analysis::WhitespaceAnalyzer
>>> //#define SUB_ANALYZER_TYPE lucene::analysis::standard::StandardAnalyzer
>>>
>>> int main(int argc, char *argv[]) {
>>>    try {
>>> #ifdef USE_PER_FIELD_ANALYZER
>>>      lucene::analysis::PerFieldAnalyzerWrapper analyzer(
>>>        _CLNEW lucene::analysis::standard::StandardAnalyzer());
>>>      analyzer.addAnalyzer(_T("First"), _CLNEW SUB_ANALYZER_TYPE());
>>>      analyzer.addAnalyzer(_T("Second"), _CLNEW SUB_ANALYZER_TYPE());
>>>      analyzer.addAnalyzer(_T("Third"), _CLNEW SUB_ANALYZER_TYPE());
>>>      analyzer.addAnalyzer(_T("Fourth"), _CLNEW SUB_ANALYZER_TYPE());
>>>      analyzer.addAnalyzer(_T("Fifth"), _CLNEW SUB_ANALYZER_TYPE()); #else
>>>      lucene::analysis::WhitespaceAnalyzer analyzer; #endif
>>>      lucene::index::IndexWriter writer(INDEX_PATH, &analyzer, true);
>>>      lucene::document::Document doc;
>>>      int flags = lucene::document::Field::STORE_YES
>>>                  | lucene::document::Field::INDEX_TOKENIZED;
>>>      for (int i = 0; i < 1000000; i++) {
>>>        doc.clear();
>>>        doc.add(*(_CLNEW lucene::document::Field(
>>>          _T("First"), _T("Blah blah blah"), flags)));
>>>        doc.add(*(_CLNEW lucene::document::Field(
>>>          _T("Second"), _T("Blah blah-- blah"), flags)));
>>>        doc.add(*(_CLNEW lucene::document::Field(
>>>          _T("Fifth"), _T("Blah blah__ blah"), flags)));
>>>        doc.add(*(_CLNEW lucene::document::Field(
>>>          _T("Eigth"), _T("Blah blah blah++"), flags)));
>>>        doc.add(*(_CLNEW lucene::document::Field(
>>>          _T("Ninth"), _T("Blah123 blah blah"), flags)));
>>>        writer.addDocument(&doc);
>>>      }
>>>      writer.close();
>>>    } catch (CLuceneError err) {
>>>      printf("CLuceneError: %s", err.what());
>>>    }
>>>    return 0;
>>> }
>>>
>>>
>>> Running valgrind gives this:
>>>> ==5003== Memcheck, a memory error detector ==5003== Copyright (C) 
>>>> 2002-2009, and GNU GPL'd, by Julian Seward et al.
>>>> ==5003== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for 
>>>> copyright info ==5003== Command: ./testcl ==5003== Parent PID: 25703 
>>>> ==5003== ==5003== ==5003== HEAP SUMMARY:
>>>> ==5003==     in use at exit: 31,840,378 bytes in 50,010 blocks
>>>> ==5003==   total heap usage: 231,219 allocs, 181,209 frees, 39,843,697
>>> bytes allocated
>>>> ==5003==
>>>> ==5003== 254 (32 direct, 222 indirect) bytes in 1 blocks are definitely
>>> lost in loss record 10 of 13
>>>> ==5003==    at 0x4025390: operator new(unsigned int)
>>> (vg_replace_malloc.c:214)
>>>> ==5003==    by 0x41D8C6D: lucene::store::FSDirectory::getDirectory(char
>>> const*, bool, lucene::store::LockFactory*) (FSDirectory.cpp:485)
>>>> ==5003==    by 0x42375F8: lucene::index::IndexWriter::IndexWriter(char
>>> const*, lucene::analysis::Analyzer*, bool) (IndexWriter.cpp:152)
>>>> ==5003==    by 0x80490D9: main (testcl.cc:23)
>>>> ==5003==
>>>> ==5003== 14,672 bytes in 14 blocks are possibly lost in loss record 11
> of
>>> 13
>>>> ==5003==    at 0x4025390: operator new(unsigned int)
>>> (vg_replace_malloc.c:214)
>>>> ==5003==    by 0x41CCDC2:
>>> lucene::analysis::WhitespaceAnalyzer::tokenStream(wchar_t const*,
>>> lucene::util::Reader*) (Analyzers.cpp:113)
>>>> ==5003==    by 0x41CC309:
>>> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*,
>>> lucene::util::Reader*) (Analyzers.cpp:298)
>>>> ==5003==    by 0x41CFFCE:
>>> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*,
>>> lucene::util::Reader*) (AnalysisHeader.cpp:36)
>>>> ==5003==    by 0x4206228:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
>>> document::Field*, lucene::analysis::Analyzer*, int)
>>> (DocumentsWriterThreadState.cpp:889)
>>>> ==5003==    by 0x42082A0:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
>>> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>>>> ==5003==    by 0x42086B6:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
>>> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>>>> ==5003==    by 0x41FE293:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*, lucene::index::Term*)
> (DocumentsWriter.cpp:934)
>>>> ==5003==    by 0x41FE406:
>>> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>>>> ==5003==    by 0x423BE41:
>>> lucene::index::IndexWriter::addDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*) (IndexWriter.cpp:668)
>>>> ==5003==    by 0x8049331: main (testcl.cc:39)
>>>> ==5003==
>>>> ==5003== 400,000 bytes in 20,000 blocks are definitely lost in loss
> record
>>> 12 of 13
>>>> ==5003==    at 0x4025390: operator new(unsigned int)
>>> (vg_replace_malloc.c:214)
>>>> ==5003==    by 0x41C9AA0:
>>> lucene::analysis::standard::StandardAnalyzer::tokenStream(wchar_t const*,
>>> lucene::util::Reader*) (StandardAnalyzer.cpp:64)
>>>> ==5003==    by 0x41CC309:
>>> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*,
>>> lucene::util::Reader*) (Analyzers.cpp:298)
>>>> ==5003==    by 0x41CFFCE:
>>> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*,
>>> lucene::util::Reader*) (AnalysisHeader.cpp:36)
>>>> ==5003==    by 0x4206228:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
>>> document::Field*, lucene::analysis::Analyzer*, int)
>>> (DocumentsWriterThreadState.cpp:889)
>>>> ==5003==    by 0x42082A0:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
>>> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>>>> ==5003==    by 0x42086B6:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
>>> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>>>> ==5003==    by 0x41FE293:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*, lucene::index::Term*)
> (DocumentsWriter.cpp:934)
>>>> ==5003==    by 0x41FE406:
>>> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>>>> ==5003==    by 0x423BE41:
>>> lucene::index::IndexWriter::addDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*) (IndexWriter.cpp:668)
>>>> ==5003==    by 0x8049331: main (testcl.cc:39)
>>>> ==5003==
>>>> ==5003== 31,425,328 bytes in 29,986 blocks are definitely lost in loss
>>> record 13 of 13
>>>> ==5003==    at 0x4025390: operator new(unsigned int)
>>> (vg_replace_malloc.c:214)
>>>> ==5003==    by 0x41CCDC2:
>>> lucene::analysis::WhitespaceAnalyzer::tokenStream(wchar_t const*,
>>> lucene::util::Reader*) (Analyzers.cpp:113)
>>>> ==5003==    by 0x41CC309:
>>> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*,
>>> lucene::util::Reader*) (Analyzers.cpp:298)
>>>> ==5003==    by 0x41CFFCE:
>>> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*,
>>> lucene::util::Reader*) (AnalysisHeader.cpp:36)
>>>> ==5003==    by 0x4206228:
> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::
>>> document::Field*, lucene::analysis::Analyzer*, int)
>>> (DocumentsWriterThreadState.cpp:889)
>>>> ==5003==    by 0x42082A0:
> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene:
>>> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795)
>>>> ==5003==    by 0x42086B6:
> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi
>>> s::Analyzer*) (DocumentsWriterThreadState.cpp:554)
>>>> ==5003==    by 0x41FE293:
> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*, lucene::index::Term*)
> (DocumentsWriter.cpp:934)
>>>> ==5003==    by 0x41FE406:
>>> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918)
>>>> ==5003==    by 0x423BE41:
>>> lucene::index::IndexWriter::addDocument(lucene::document::Document*,
>>> lucene::analysis::Analyzer*) (IndexWriter.cpp:668)
>>>> ==5003==    by 0x8049331: main (testcl.cc:39)
>>>> ==5003==
>>>> ==5003== LEAK SUMMARY:
>>>> ==5003==    definitely lost: 31,825,360 bytes in 49,987 blocks
>>>> ==5003==    indirectly lost: 222 bytes in 5 blocks
>>>> ==5003==      possibly lost: 14,672 bytes in 14 blocks
>>>> ==5003==    still reachable: 124 bytes in 4 blocks
>>>> ==5003==         suppressed: 0 bytes in 0 blocks
>>>> ==5003== Reachable blocks (those to which a pointer was found) are not
>>> shown.
>>>> ==5003== To see them, rerun with: --leak-check=full 
>>>> --show-reachable=yes ==5003== ==5003== For counts of detected and 
>>>> suppressed errors, rerun with: -v ==5003== ERROR SUMMARY: 4 errors 
>>>> from 4 contexts (suppressed: 27 from 8)
>>> Thanks for looking into this!
>>>
>>>
>>> Itamar Syn-Hershko wrote:
>>>> Hi,
>>>>
>>>> I ran TestAnalyzers.cpp (specifically testPerFieldAnalzyerWrapper() ) 
>>>> from our test suite, and detected no leaks. I also tried replacing
>>>>
>>>>    analyzer.addAnalyzer(_T("special"), _CLNEW SimpleAnalyzer());
>>>>
>>>> With
>>>>
>>>>    analyzer.addAnalyzer(_T("special"), _CLNEW StandardAnalyzer());
>>>>
>>>> And still found nothing.
>>>>
>>>> I used our 2_3_2 master branch from the git repository (see 
>>>> http://clucene.sourceforge.net/download.shtml).
>>>>
>>>> If you're using this branch, please let me know the details of the 
>>>> leaks you're detecting.
>>>>
>>>> Itamar. 
>>>>
>>>> -----Original Message-----
>>>> From: Michael Levin [mailto:mele...@stanford.edu]
>>>> Sent: Monday, November 02, 2009 8:47 PM
>>>> To: clucene-developers@lists.sourceforge.net
>>>> Subject: [CLucene-dev] PerFieldAnalyzerWrapper memory leak
>>>>
>>>> Hi,
>>>>
>>>> I am working on a program to index about 25gb of data and when I run 
>>>> CLucene with a PerFieldAnalyzerWrapper it leaks memory and inevitably 
>>>> crashes because it runs out of memory.
>>>>
>>>> Here is my code:
>>>>
>>>> lucene::analysis::PerFieldAnalyzerWrapper
>>>>    analyzer(new lucene::analysis::standard::StandardAnalyzer());
>>>> analyzer.addAnalyzer(_T("Authors"),
>>>>    new lucene::analysis::WhitespaceAnalyzer());
>>>> analyzer.addAnalyzer(_T("ReprintAuthor"),
>>>>    new lucene::analysis::WhitespaceAnalyzer());
>>>> analyzer.addAnalyzer(_T("Name"),
>>>>    new lucene::analysis::WhitespaceAnalyzer());
>>>> analyzer.addAnalyzer(_T("Email"),
>>>>    new lucene::analysis::WhitespaceAnalyzer());
>>>>
>>>> If I replace that snippet with a plain WhitespaceAnalyzer there is no 
>>>> memory
>>>> leak:
>>>>
>>>> lucene::analysis::WhitespaceAnalyzer analyzer;
>>>>
>>>> Am I using the PerFieldAnalyzerWrapper class wrong or is this a bug in 
>>>> CLucene?
>>>>
>>>> Thanks!

-- 
Michael Levin <mele...@stanford.edu>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to