Yes, you will get these same leaks in valgrind if you just run the test code without modification. They are quite small though so I don't know if these leaks are what is causing the problem or there is another major leak that only shows up with a large volume of data... I will try to come up with a better test case but I would appreciate it if you would take a look at the smaller leaks as well. Thank you!
Itamar Syn-Hershko wrote: > Michael, > > Looking at this report, can this be run and traced using the code you sent? > Since I doubt it, can you send a small test function to test against? > > Itamar. > > -----Original Message----- > From: Michael Levin [mailto:mele...@stanford.edu] > Sent: Sunday, November 15, 2009 7:48 AM > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] PerFieldAnalyzerWrapper memory leak > > Itamar, > > Sorry to bother you about this again but I am rebuilding my 47gb index and I > think there is a memory leak in CLucene again. CLucene is eating up 1.4gb of > RAM and steadily rising. The leak is not as severe as the previous one but > it is still preventing indexing from finishing on my machine. > > I ran valgrind while building a smaller index and it reported the following > leaks in my indexer: > >> ==25753== Memcheck, a memory error detector ==25753== Copyright (C) >> 2002-2009, and GNU GPL'd, by Julian Seward et al. >> ==25753== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for >> copyright info ==25753== Command: ./index -d data/test1 -o >> data/test1/articles -a data/test1/authors -s data/stopwords.txt -O -R >> 128 ==25753== Parent PID: 2683 ==25753== ==25753== ==25753== HEAP >> SUMMARY: >> ==25753== in use at exit: 264 bytes in 8 blocks >> ==25753== total heap usage: 1,892,514 allocs, 1,892,506 frees, > 994,487,471 bytes allocated >> ==25753== >> ==25753== 4 bytes in 1 blocks are still reachable in loss record 1 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x426D7DB: lucene::search::Similarity::getDefault() > (Similarity.cpp:143) >> ==25753== by 0x4249172: > lucene::index::IndexWriter::init(lucene::store::Directory*, > lucene::analysis::Analyzer*, bool, bool, > lucene::index::IndexDeletionPolicy*, bool) (IndexWriter.cpp:201) >> ==25753== by 0x4249C39: lucene::index::IndexWriter::IndexWriter(char > const*, lucene::analysis::Analyzer*, bool) (IndexWriter.cpp:153) >> ==25753== by 0x8050841: main (index.cc:292) >> ==25753== >> ==25753== 12 bytes in 1 blocks are still reachable in loss record 2 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x422104D: global constructors keyed to > TermVectorReader.cpp (TermVectorReader.cpp:417) >> ==25753== by 0x429184C: ??? (in > /usr/local/lib/libclucene-core.so.0.9.23.0) >> ==25753== by 0x41C2553: ??? (in > /usr/local/lib/libclucene-core.so.0.9.23.0) >> ==25753== by 0x400D8BB: call_init (dl-init.c:70) >> ==25753== by 0x400DA20: _dl_init (dl-init.c:134) >> ==25753== by 0x400088E: ??? (in /lib/ld-2.10.1.so) >> ==25753== >> ==25753== 20 bytes in 1 blocks are still reachable in loss record 3 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x41C76FB: lucene::util::_ThreadLocal::set(void*) > (ThreadLocal.cpp:177) >> ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) > (_ThreadLocal.h:82) >> ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) >> ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) >> ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: > document::Field*, lucene::analysis::Analyzer*, int) > (DocumentsWriterThreadState.cpp:889) >> ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: > :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >> ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi > s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >> ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >> ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >> ==25753== by 0x4251011: > lucene::index::IndexWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (IndexWriter.cpp:670) >> ==25753== by 0x804FBFD: indexArticle(lucene::index::IndexWriter*, > lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*, > ISIArticle*, char const*, char const*) (index.cc:122) >> ==25753== >> ==25753== 24 bytes in 1 blocks are still reachable in loss record 4 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x41C77A4: lucene::util::_ThreadLocal::set(void*) > (new_allocator.h:89) >> ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) > (_ThreadLocal.h:82) >> ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) >> ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) >> ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: > document::Field*, lucene::analysis::Analyzer*, int) > (DocumentsWriterThreadState.cpp:889) >> ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: > :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >> ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi > s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >> ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >> ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >> ==25753== by 0x4251011: > lucene::index::IndexWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (IndexWriter.cpp:670) >> ==25753== by 0x804FBFD: indexArticle(lucene::index::IndexWriter*, > lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*, > ISIArticle*, char const*, char const*) (index.cc:122) >> ==25753== >> ==25753== 32 bytes in 1 blocks are still reachable in loss record 5 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x4284B8D: global constructors keyed to > FieldSortedHitQueue.cpp (FieldSortedHitQueue.cpp:56) >> ==25753== by 0x429184C: ??? (in > /usr/local/lib/libclucene-core.so.0.9.23.0) >> ==25753== by 0x41C2553: ??? (in > /usr/local/lib/libclucene-core.so.0.9.23.0) >> ==25753== by 0x400D8BB: call_init (dl-init.c:70) >> ==25753== by 0x400DA20: _dl_init (dl-init.c:134) >> ==25753== by 0x400088E: ??? (in /lib/ld-2.10.1.so) >> ==25753== >> ==25753== 32 bytes in 1 blocks are still reachable in loss record 6 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x41C77EB: lucene::util::_ThreadLocal::set(void*) > (ThreadLocal.cpp:173) >> ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) > (_ThreadLocal.h:82) >> ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) >> ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) >> ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: > document::Field*, lucene::analysis::Analyzer*, int) > (DocumentsWriterThreadState.cpp:889) >> ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: > :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >> ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi > s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >> ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >> ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >> ==25753== by 0x4251011: > lucene::index::IndexWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (IndexWriter.cpp:670) >> ==25753== by 0x804FBFD: indexArticle(lucene::index::IndexWriter*, > lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*, > ISIArticle*, char const*, char const*) (index.cc:122) >> ==25753== >> ==25753== 64 bytes in 1 blocks are still reachable in loss record 7 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x41C851F: std::vector<lucene::util::_ThreadLocal*, > std::allocator<lucene::util::_ThreadLocal*> >> ::_M_insert_aux(__gnu_cxx::__normal_iterator<lucene::util::_ThreadLocal**, > std::vector<lucene::util::_ThreadLocal*, > std::allocator<lucene::util::_ThreadLocal*> > >, lucene::util::_ThreadLocal* > const&) (new_allocator.h:89) >> ==25753== by 0x41C74CD: > lucene::util::ThreadLocals::add(lucene::util::_ThreadLocal*) > (stl_vector.h:741) >> ==25753== by 0x41C7586: lucene::util::_ThreadLocal::set(void*) > (ThreadLocal.cpp:180) >> ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) > (_ThreadLocal.h:82) >> ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) >> ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) >> ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: > document::Field*, lucene::analysis::Analyzer*, int) > (DocumentsWriterThreadState.cpp:889) >> ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: > :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >> ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi > s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >> ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >> ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >> ==25753== >> ==25753== 76 bytes in 1 blocks are still reachable in loss record 8 of 8 >> ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) >> ==25753== by 0x41FBA32: global constructors keyed to > IndexFileNameFilter.cpp (IndexFileNameFilter.cpp:8) >> ==25753== by 0x429184C: ??? (in > /usr/local/lib/libclucene-core.so.0.9.23.0) >> ==25753== by 0x41C2553: ??? (in > /usr/local/lib/libclucene-core.so.0.9.23.0) >> ==25753== by 0x400D8BB: call_init (dl-init.c:70) >> ==25753== by 0x400DA20: _dl_init (dl-init.c:134) >> ==25753== by 0x400088E: ??? (in /lib/ld-2.10.1.so) >> ==25753== >> ==25753== LEAK SUMMARY: >> ==25753== definitely lost: 0 bytes in 0 blocks >> ==25753== indirectly lost: 0 bytes in 0 blocks >> ==25753== possibly lost: 0 bytes in 0 blocks >> ==25753== still reachable: 264 bytes in 8 blocks >> ==25753== suppressed: 0 bytes in 0 blocks >> ==25753== >> ==25753== For counts of detected and suppressed errors, rerun with: -v >> ==25753== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 27 from >> 8) > > These are tiny leaks but perhaps one of them is growing into a much larger > leak with bigger input? > > I will run valgrind on the test program I sent you before and send you the > output. > > > Michael Levin wrote: >> It looks like the leaks have been fixed! Thank you so much!! :) >> >> Itamar Syn-Hershko wrote: >>> Thanks. 31MB+ is a serious leak indeed... >>> >>> The issue was with an incomplete implementation of the internal >>> reusableTokenStream; it only was visible when more than one analyzer was >>> added, and only with StanadardAnalyzer. Similar issue may exist with >>> StopAnalyzer, I fixed this there as well but haven't tested it. >>> >>> I didn't have time to thoroughly test this, so please let me know if > there >>> any more issues with this. >>> >>> Updated git master with latest code. >>> >>> Itamar. >>> >>> -----Original Message----- >>> From: Michael Levin [mailto:mele...@stanford.edu] >>> Sent: Wednesday, November 04, 2009 9:12 AM >>> To: clucene-developers@lists.sourceforge.net >>> Subject: Re: [CLucene-dev] PerFieldAnalyzerWrapper memory leak >>> >>> The problem appears to be using a WhitespaceAnalyzer inside of a >>> PerFieldAnalyzerWrapper. Try running this program and change the analyzer >>> sub-type by toggling the #defines: >>> >>> >>> #include <cstdio> >>> #include <CLucene.h> >>> >>> #define INDEX_PATH "index" >>> #define USE_PER_FIELD_ANALYZER >>> #define SUB_ANALYZER_TYPE lucene::analysis::WhitespaceAnalyzer >>> //#define SUB_ANALYZER_TYPE lucene::analysis::standard::StandardAnalyzer >>> >>> int main(int argc, char *argv[]) { >>> try { >>> #ifdef USE_PER_FIELD_ANALYZER >>> lucene::analysis::PerFieldAnalyzerWrapper analyzer( >>> _CLNEW lucene::analysis::standard::StandardAnalyzer()); >>> analyzer.addAnalyzer(_T("First"), _CLNEW SUB_ANALYZER_TYPE()); >>> analyzer.addAnalyzer(_T("Second"), _CLNEW SUB_ANALYZER_TYPE()); >>> analyzer.addAnalyzer(_T("Third"), _CLNEW SUB_ANALYZER_TYPE()); >>> analyzer.addAnalyzer(_T("Fourth"), _CLNEW SUB_ANALYZER_TYPE()); >>> analyzer.addAnalyzer(_T("Fifth"), _CLNEW SUB_ANALYZER_TYPE()); #else >>> lucene::analysis::WhitespaceAnalyzer analyzer; #endif >>> lucene::index::IndexWriter writer(INDEX_PATH, &analyzer, true); >>> lucene::document::Document doc; >>> int flags = lucene::document::Field::STORE_YES >>> | lucene::document::Field::INDEX_TOKENIZED; >>> for (int i = 0; i < 1000000; i++) { >>> doc.clear(); >>> doc.add(*(_CLNEW lucene::document::Field( >>> _T("First"), _T("Blah blah blah"), flags))); >>> doc.add(*(_CLNEW lucene::document::Field( >>> _T("Second"), _T("Blah blah-- blah"), flags))); >>> doc.add(*(_CLNEW lucene::document::Field( >>> _T("Fifth"), _T("Blah blah__ blah"), flags))); >>> doc.add(*(_CLNEW lucene::document::Field( >>> _T("Eigth"), _T("Blah blah blah++"), flags))); >>> doc.add(*(_CLNEW lucene::document::Field( >>> _T("Ninth"), _T("Blah123 blah blah"), flags))); >>> writer.addDocument(&doc); >>> } >>> writer.close(); >>> } catch (CLuceneError err) { >>> printf("CLuceneError: %s", err.what()); >>> } >>> return 0; >>> } >>> >>> >>> Running valgrind gives this: >>>> ==5003== Memcheck, a memory error detector ==5003== Copyright (C) >>>> 2002-2009, and GNU GPL'd, by Julian Seward et al. >>>> ==5003== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for >>>> copyright info ==5003== Command: ./testcl ==5003== Parent PID: 25703 >>>> ==5003== ==5003== ==5003== HEAP SUMMARY: >>>> ==5003== in use at exit: 31,840,378 bytes in 50,010 blocks >>>> ==5003== total heap usage: 231,219 allocs, 181,209 frees, 39,843,697 >>> bytes allocated >>>> ==5003== >>>> ==5003== 254 (32 direct, 222 indirect) bytes in 1 blocks are definitely >>> lost in loss record 10 of 13 >>>> ==5003== at 0x4025390: operator new(unsigned int) >>> (vg_replace_malloc.c:214) >>>> ==5003== by 0x41D8C6D: lucene::store::FSDirectory::getDirectory(char >>> const*, bool, lucene::store::LockFactory*) (FSDirectory.cpp:485) >>>> ==5003== by 0x42375F8: lucene::index::IndexWriter::IndexWriter(char >>> const*, lucene::analysis::Analyzer*, bool) (IndexWriter.cpp:152) >>>> ==5003== by 0x80490D9: main (testcl.cc:23) >>>> ==5003== >>>> ==5003== 14,672 bytes in 14 blocks are possibly lost in loss record 11 > of >>> 13 >>>> ==5003== at 0x4025390: operator new(unsigned int) >>> (vg_replace_malloc.c:214) >>>> ==5003== by 0x41CCDC2: >>> lucene::analysis::WhitespaceAnalyzer::tokenStream(wchar_t const*, >>> lucene::util::Reader*) (Analyzers.cpp:113) >>>> ==5003== by 0x41CC309: >>> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*, >>> lucene::util::Reader*) (Analyzers.cpp:298) >>>> ==5003== by 0x41CFFCE: >>> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*, >>> lucene::util::Reader*) (AnalysisHeader.cpp:36) >>>> ==5003== by 0x4206228: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: >>> document::Field*, lucene::analysis::Analyzer*, int) >>> (DocumentsWriterThreadState.cpp:889) >>>> ==5003== by 0x42082A0: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: >>> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >>>> ==5003== by 0x42086B6: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi >>> s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >>>> ==5003== by 0x41FE293: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*, lucene::index::Term*) > (DocumentsWriter.cpp:934) >>>> ==5003== by 0x41FE406: >>> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >>>> ==5003== by 0x423BE41: >>> lucene::index::IndexWriter::addDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*) (IndexWriter.cpp:668) >>>> ==5003== by 0x8049331: main (testcl.cc:39) >>>> ==5003== >>>> ==5003== 400,000 bytes in 20,000 blocks are definitely lost in loss > record >>> 12 of 13 >>>> ==5003== at 0x4025390: operator new(unsigned int) >>> (vg_replace_malloc.c:214) >>>> ==5003== by 0x41C9AA0: >>> lucene::analysis::standard::StandardAnalyzer::tokenStream(wchar_t const*, >>> lucene::util::Reader*) (StandardAnalyzer.cpp:64) >>>> ==5003== by 0x41CC309: >>> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*, >>> lucene::util::Reader*) (Analyzers.cpp:298) >>>> ==5003== by 0x41CFFCE: >>> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*, >>> lucene::util::Reader*) (AnalysisHeader.cpp:36) >>>> ==5003== by 0x4206228: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: >>> document::Field*, lucene::analysis::Analyzer*, int) >>> (DocumentsWriterThreadState.cpp:889) >>>> ==5003== by 0x42082A0: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: >>> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >>>> ==5003== by 0x42086B6: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi >>> s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >>>> ==5003== by 0x41FE293: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*, lucene::index::Term*) > (DocumentsWriter.cpp:934) >>>> ==5003== by 0x41FE406: >>> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >>>> ==5003== by 0x423BE41: >>> lucene::index::IndexWriter::addDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*) (IndexWriter.cpp:668) >>>> ==5003== by 0x8049331: main (testcl.cc:39) >>>> ==5003== >>>> ==5003== 31,425,328 bytes in 29,986 blocks are definitely lost in loss >>> record 13 of 13 >>>> ==5003== at 0x4025390: operator new(unsigned int) >>> (vg_replace_malloc.c:214) >>>> ==5003== by 0x41CCDC2: >>> lucene::analysis::WhitespaceAnalyzer::tokenStream(wchar_t const*, >>> lucene::util::Reader*) (Analyzers.cpp:113) >>>> ==5003== by 0x41CC309: >>> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*, >>> lucene::util::Reader*) (Analyzers.cpp:298) >>>> ==5003== by 0x41CFFCE: >>> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*, >>> lucene::util::Reader*) (AnalysisHeader.cpp:36) >>>> ==5003== by 0x4206228: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: >>> document::Field*, lucene::analysis::Analyzer*, int) >>> (DocumentsWriterThreadState.cpp:889) >>>> ==5003== by 0x42082A0: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: >>> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >>>> ==5003== by 0x42086B6: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi >>> s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >>>> ==5003== by 0x41FE293: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*, lucene::index::Term*) > (DocumentsWriter.cpp:934) >>>> ==5003== by 0x41FE406: >>> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >>>> ==5003== by 0x423BE41: >>> lucene::index::IndexWriter::addDocument(lucene::document::Document*, >>> lucene::analysis::Analyzer*) (IndexWriter.cpp:668) >>>> ==5003== by 0x8049331: main (testcl.cc:39) >>>> ==5003== >>>> ==5003== LEAK SUMMARY: >>>> ==5003== definitely lost: 31,825,360 bytes in 49,987 blocks >>>> ==5003== indirectly lost: 222 bytes in 5 blocks >>>> ==5003== possibly lost: 14,672 bytes in 14 blocks >>>> ==5003== still reachable: 124 bytes in 4 blocks >>>> ==5003== suppressed: 0 bytes in 0 blocks >>>> ==5003== Reachable blocks (those to which a pointer was found) are not >>> shown. >>>> ==5003== To see them, rerun with: --leak-check=full >>>> --show-reachable=yes ==5003== ==5003== For counts of detected and >>>> suppressed errors, rerun with: -v ==5003== ERROR SUMMARY: 4 errors >>>> from 4 contexts (suppressed: 27 from 8) >>> Thanks for looking into this! >>> >>> >>> Itamar Syn-Hershko wrote: >>>> Hi, >>>> >>>> I ran TestAnalyzers.cpp (specifically testPerFieldAnalzyerWrapper() ) >>>> from our test suite, and detected no leaks. I also tried replacing >>>> >>>> analyzer.addAnalyzer(_T("special"), _CLNEW SimpleAnalyzer()); >>>> >>>> With >>>> >>>> analyzer.addAnalyzer(_T("special"), _CLNEW StandardAnalyzer()); >>>> >>>> And still found nothing. >>>> >>>> I used our 2_3_2 master branch from the git repository (see >>>> http://clucene.sourceforge.net/download.shtml). >>>> >>>> If you're using this branch, please let me know the details of the >>>> leaks you're detecting. >>>> >>>> Itamar. >>>> >>>> -----Original Message----- >>>> From: Michael Levin [mailto:mele...@stanford.edu] >>>> Sent: Monday, November 02, 2009 8:47 PM >>>> To: clucene-developers@lists.sourceforge.net >>>> Subject: [CLucene-dev] PerFieldAnalyzerWrapper memory leak >>>> >>>> Hi, >>>> >>>> I am working on a program to index about 25gb of data and when I run >>>> CLucene with a PerFieldAnalyzerWrapper it leaks memory and inevitably >>>> crashes because it runs out of memory. >>>> >>>> Here is my code: >>>> >>>> lucene::analysis::PerFieldAnalyzerWrapper >>>> analyzer(new lucene::analysis::standard::StandardAnalyzer()); >>>> analyzer.addAnalyzer(_T("Authors"), >>>> new lucene::analysis::WhitespaceAnalyzer()); >>>> analyzer.addAnalyzer(_T("ReprintAuthor"), >>>> new lucene::analysis::WhitespaceAnalyzer()); >>>> analyzer.addAnalyzer(_T("Name"), >>>> new lucene::analysis::WhitespaceAnalyzer()); >>>> analyzer.addAnalyzer(_T("Email"), >>>> new lucene::analysis::WhitespaceAnalyzer()); >>>> >>>> If I replace that snippet with a plain WhitespaceAnalyzer there is no >>>> memory >>>> leak: >>>> >>>> lucene::analysis::WhitespaceAnalyzer analyzer; >>>> >>>> Am I using the PerFieldAnalyzerWrapper class wrong or is this a bug in >>>> CLucene? >>>> >>>> Thanks! -- Michael Levin <mele...@stanford.edu> ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers