Itamar, Sorry to bother you about this again but I am rebuilding my 47gb index and I think there is a memory leak in CLucene again. CLucene is eating up 1.4gb of RAM and steadily rising. The leak is not as severe as the previous one but it is still preventing indexing from finishing on my machine.
I ran valgrind while building a smaller index and it reported the following leaks in my indexer: > ==25753== Memcheck, a memory error detector > ==25753== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. > ==25753== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright > info > ==25753== Command: ./index -d data/test1 -o data/test1/articles -a > data/test1/authors -s data/stopwords.txt -O -R 128 > ==25753== Parent PID: 2683 > ==25753== > ==25753== > ==25753== HEAP SUMMARY: > ==25753== in use at exit: 264 bytes in 8 blocks > ==25753== total heap usage: 1,892,514 allocs, 1,892,506 frees, 994,487,471 > bytes allocated > ==25753== > ==25753== 4 bytes in 1 blocks are still reachable in loss record 1 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x426D7DB: lucene::search::Similarity::getDefault() > (Similarity.cpp:143) > ==25753== by 0x4249172: > lucene::index::IndexWriter::init(lucene::store::Directory*, > lucene::analysis::Analyzer*, bool, bool, lucene::index::IndexDeletionPolicy*, > bool) (IndexWriter.cpp:201) > ==25753== by 0x4249C39: lucene::index::IndexWriter::IndexWriter(char > const*, lucene::analysis::Analyzer*, bool) (IndexWriter.cpp:153) > ==25753== by 0x8050841: main (index.cc:292) > ==25753== > ==25753== 12 bytes in 1 blocks are still reachable in loss record 2 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x422104D: global constructors keyed to TermVectorReader.cpp > (TermVectorReader.cpp:417) > ==25753== by 0x429184C: ??? (in /usr/local/lib/libclucene-core.so.0.9.23.0) > ==25753== by 0x41C2553: ??? (in /usr/local/lib/libclucene-core.so.0.9.23.0) > ==25753== by 0x400D8BB: call_init (dl-init.c:70) > ==25753== by 0x400DA20: _dl_init (dl-init.c:134) > ==25753== by 0x400088E: ??? (in /lib/ld-2.10.1.so) > ==25753== > ==25753== 20 bytes in 1 blocks are still reachable in loss record 3 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x41C76FB: lucene::util::_ThreadLocal::set(void*) > (ThreadLocal.cpp:177) > ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) (_ThreadLocal.h:82) > ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) > ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) > ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, > lucene::analysis::Analyzer*, int) (DocumentsWriterThreadState.cpp:889) > ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:795) > ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:554) > ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) > ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) > ==25753== by 0x4251011: > lucene::index::IndexWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (IndexWriter.cpp:670) > ==25753== by 0x804FBFD: indexArticle(lucene::index::IndexWriter*, > lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*, > ISIArticle*, char const*, char const*) (index.cc:122) > ==25753== > ==25753== 24 bytes in 1 blocks are still reachable in loss record 4 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x41C77A4: lucene::util::_ThreadLocal::set(void*) > (new_allocator.h:89) > ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) (_ThreadLocal.h:82) > ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) > ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) > ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, > lucene::analysis::Analyzer*, int) (DocumentsWriterThreadState.cpp:889) > ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:795) > ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:554) > ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) > ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) > ==25753== by 0x4251011: > lucene::index::IndexWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (IndexWriter.cpp:670) > ==25753== by 0x804FBFD: indexArticle(lucene::index::IndexWriter*, > lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*, > ISIArticle*, char const*, char const*) (index.cc:122) > ==25753== > ==25753== 32 bytes in 1 blocks are still reachable in loss record 5 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x4284B8D: global constructors keyed to > FieldSortedHitQueue.cpp (FieldSortedHitQueue.cpp:56) > ==25753== by 0x429184C: ??? (in /usr/local/lib/libclucene-core.so.0.9.23.0) > ==25753== by 0x41C2553: ??? (in /usr/local/lib/libclucene-core.so.0.9.23.0) > ==25753== by 0x400D8BB: call_init (dl-init.c:70) > ==25753== by 0x400DA20: _dl_init (dl-init.c:134) > ==25753== by 0x400088E: ??? (in /lib/ld-2.10.1.so) > ==25753== > ==25753== 32 bytes in 1 blocks are still reachable in loss record 6 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x41C77EB: lucene::util::_ThreadLocal::set(void*) > (ThreadLocal.cpp:173) > ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) (_ThreadLocal.h:82) > ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) > ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) > ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, > lucene::analysis::Analyzer*, int) (DocumentsWriterThreadState.cpp:889) > ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:795) > ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:554) > ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) > ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) > ==25753== by 0x4251011: > lucene::index::IndexWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (IndexWriter.cpp:670) > ==25753== by 0x804FBFD: indexArticle(lucene::index::IndexWriter*, > lucene::index::IndexWriter*, lucene::document::Document*, ISIIssue*, > ISIArticle*, char const*, char const*) (index.cc:122) > ==25753== > ==25753== 64 bytes in 1 blocks are still reachable in loss record 7 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x41C851F: std::vector<lucene::util::_ThreadLocal*, > std::allocator<lucene::util::_ThreadLocal*> > >::_M_insert_aux(__gnu_cxx::__normal_iterator<lucene::util::_ThreadLocal**, > std::vector<lucene::util::_ThreadLocal*, > std::allocator<lucene::util::_ThreadLocal*> > >, lucene::util::_ThreadLocal* > const&) (new_allocator.h:89) > ==25753== by 0x41C74CD: > lucene::util::ThreadLocals::add(lucene::util::_ThreadLocal*) > (stl_vector.h:741) > ==25753== by 0x41C7586: lucene::util::_ThreadLocal::set(void*) > (ThreadLocal.cpp:180) > ==25753== by 0x41E9D38: > lucene::analysis::Analyzer::setPreviousTokenStream(void*) (_ThreadLocal.h:82) > ==25753== by 0x41E5F18: > lucene::analysis::WhitespaceAnalyzer::reusableTokenStream(wchar_t const*, > lucene::util::Reader*) (Analyzers.cpp:120) > ==25753== by 0x41E7282: > lucene::analysis::PerFieldAnalyzerWrapper::reusableTokenStream(wchar_t > const*, lucene::util::Reader*) (Analyzers.cpp:327) > ==25753== by 0x421D0F8: > lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, > lucene::analysis::Analyzer*, int) (DocumentsWriterThreadState.cpp:889) > ==25753== by 0x421F170: > lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:795) > ==25753== by 0x421F586: > lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) > (DocumentsWriterThreadState.cpp:554) > ==25753== by 0x4212F73: > lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, > lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) > ==25753== by 0x42130C6: > lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, > lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) > ==25753== > ==25753== 76 bytes in 1 blocks are still reachable in loss record 8 of 8 > ==25753== at 0x4025390: operator new(unsigned int) > (vg_replace_malloc.c:214) > ==25753== by 0x41FBA32: global constructors keyed to > IndexFileNameFilter.cpp (IndexFileNameFilter.cpp:8) > ==25753== by 0x429184C: ??? (in /usr/local/lib/libclucene-core.so.0.9.23.0) > ==25753== by 0x41C2553: ??? (in /usr/local/lib/libclucene-core.so.0.9.23.0) > ==25753== by 0x400D8BB: call_init (dl-init.c:70) > ==25753== by 0x400DA20: _dl_init (dl-init.c:134) > ==25753== by 0x400088E: ??? (in /lib/ld-2.10.1.so) > ==25753== > ==25753== LEAK SUMMARY: > ==25753== definitely lost: 0 bytes in 0 blocks > ==25753== indirectly lost: 0 bytes in 0 blocks > ==25753== possibly lost: 0 bytes in 0 blocks > ==25753== still reachable: 264 bytes in 8 blocks > ==25753== suppressed: 0 bytes in 0 blocks > ==25753== > ==25753== For counts of detected and suppressed errors, rerun with: -v > ==25753== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 27 from 8) These are tiny leaks but perhaps one of them is growing into a much larger leak with bigger input? I will run valgrind on the test program I sent you before and send you the output. Michael Levin wrote: > It looks like the leaks have been fixed! Thank you so much!! :) > > Itamar Syn-Hershko wrote: >> Thanks. 31MB+ is a serious leak indeed... >> >> The issue was with an incomplete implementation of the internal >> reusableTokenStream; it only was visible when more than one analyzer was >> added, and only with StanadardAnalyzer. Similar issue may exist with >> StopAnalyzer, I fixed this there as well but haven't tested it. >> >> I didn't have time to thoroughly test this, so please let me know if there >> any more issues with this. >> >> Updated git master with latest code. >> >> Itamar. >> >> -----Original Message----- >> From: Michael Levin [mailto:mele...@stanford.edu] >> Sent: Wednesday, November 04, 2009 9:12 AM >> To: clucene-developers@lists.sourceforge.net >> Subject: Re: [CLucene-dev] PerFieldAnalyzerWrapper memory leak >> >> The problem appears to be using a WhitespaceAnalyzer inside of a >> PerFieldAnalyzerWrapper. Try running this program and change the analyzer >> sub-type by toggling the #defines: >> >> >> #include <cstdio> >> #include <CLucene.h> >> >> #define INDEX_PATH "index" >> #define USE_PER_FIELD_ANALYZER >> #define SUB_ANALYZER_TYPE lucene::analysis::WhitespaceAnalyzer >> //#define SUB_ANALYZER_TYPE lucene::analysis::standard::StandardAnalyzer >> >> int main(int argc, char *argv[]) { >> try { >> #ifdef USE_PER_FIELD_ANALYZER >> lucene::analysis::PerFieldAnalyzerWrapper analyzer( >> _CLNEW lucene::analysis::standard::StandardAnalyzer()); >> analyzer.addAnalyzer(_T("First"), _CLNEW SUB_ANALYZER_TYPE()); >> analyzer.addAnalyzer(_T("Second"), _CLNEW SUB_ANALYZER_TYPE()); >> analyzer.addAnalyzer(_T("Third"), _CLNEW SUB_ANALYZER_TYPE()); >> analyzer.addAnalyzer(_T("Fourth"), _CLNEW SUB_ANALYZER_TYPE()); >> analyzer.addAnalyzer(_T("Fifth"), _CLNEW SUB_ANALYZER_TYPE()); #else >> lucene::analysis::WhitespaceAnalyzer analyzer; #endif >> lucene::index::IndexWriter writer(INDEX_PATH, &analyzer, true); >> lucene::document::Document doc; >> int flags = lucene::document::Field::STORE_YES >> | lucene::document::Field::INDEX_TOKENIZED; >> for (int i = 0; i < 1000000; i++) { >> doc.clear(); >> doc.add(*(_CLNEW lucene::document::Field( >> _T("First"), _T("Blah blah blah"), flags))); >> doc.add(*(_CLNEW lucene::document::Field( >> _T("Second"), _T("Blah blah-- blah"), flags))); >> doc.add(*(_CLNEW lucene::document::Field( >> _T("Fifth"), _T("Blah blah__ blah"), flags))); >> doc.add(*(_CLNEW lucene::document::Field( >> _T("Eigth"), _T("Blah blah blah++"), flags))); >> doc.add(*(_CLNEW lucene::document::Field( >> _T("Ninth"), _T("Blah123 blah blah"), flags))); >> writer.addDocument(&doc); >> } >> writer.close(); >> } catch (CLuceneError err) { >> printf("CLuceneError: %s", err.what()); >> } >> return 0; >> } >> >> >> Running valgrind gives this: >>> ==5003== Memcheck, a memory error detector ==5003== Copyright (C) >>> 2002-2009, and GNU GPL'd, by Julian Seward et al. >>> ==5003== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for >>> copyright info ==5003== Command: ./testcl ==5003== Parent PID: 25703 >>> ==5003== ==5003== ==5003== HEAP SUMMARY: >>> ==5003== in use at exit: 31,840,378 bytes in 50,010 blocks >>> ==5003== total heap usage: 231,219 allocs, 181,209 frees, 39,843,697 >> bytes allocated >>> ==5003== >>> ==5003== 254 (32 direct, 222 indirect) bytes in 1 blocks are definitely >> lost in loss record 10 of 13 >>> ==5003== at 0x4025390: operator new(unsigned int) >> (vg_replace_malloc.c:214) >>> ==5003== by 0x41D8C6D: lucene::store::FSDirectory::getDirectory(char >> const*, bool, lucene::store::LockFactory*) (FSDirectory.cpp:485) >>> ==5003== by 0x42375F8: lucene::index::IndexWriter::IndexWriter(char >> const*, lucene::analysis::Analyzer*, bool) (IndexWriter.cpp:152) >>> ==5003== by 0x80490D9: main (testcl.cc:23) >>> ==5003== >>> ==5003== 14,672 bytes in 14 blocks are possibly lost in loss record 11 of >> 13 >>> ==5003== at 0x4025390: operator new(unsigned int) >> (vg_replace_malloc.c:214) >>> ==5003== by 0x41CCDC2: >> lucene::analysis::WhitespaceAnalyzer::tokenStream(wchar_t const*, >> lucene::util::Reader*) (Analyzers.cpp:113) >>> ==5003== by 0x41CC309: >> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*, >> lucene::util::Reader*) (Analyzers.cpp:298) >>> ==5003== by 0x41CFFCE: >> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*, >> lucene::util::Reader*) (AnalysisHeader.cpp:36) >>> ==5003== by 0x4206228: >> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: >> document::Field*, lucene::analysis::Analyzer*, int) >> (DocumentsWriterThreadState.cpp:889) >>> ==5003== by 0x42082A0: >> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: >> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >>> ==5003== by 0x42086B6: >> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi >> s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >>> ==5003== by 0x41FE293: >> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >>> ==5003== by 0x41FE406: >> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >>> ==5003== by 0x423BE41: >> lucene::index::IndexWriter::addDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*) (IndexWriter.cpp:668) >>> ==5003== by 0x8049331: main (testcl.cc:39) >>> ==5003== >>> ==5003== 400,000 bytes in 20,000 blocks are definitely lost in loss record >> 12 of 13 >>> ==5003== at 0x4025390: operator new(unsigned int) >> (vg_replace_malloc.c:214) >>> ==5003== by 0x41C9AA0: >> lucene::analysis::standard::StandardAnalyzer::tokenStream(wchar_t const*, >> lucene::util::Reader*) (StandardAnalyzer.cpp:64) >>> ==5003== by 0x41CC309: >> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*, >> lucene::util::Reader*) (Analyzers.cpp:298) >>> ==5003== by 0x41CFFCE: >> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*, >> lucene::util::Reader*) (AnalysisHeader.cpp:36) >>> ==5003== by 0x4206228: >> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: >> document::Field*, lucene::analysis::Analyzer*, int) >> (DocumentsWriterThreadState.cpp:889) >>> ==5003== by 0x42082A0: >> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: >> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >>> ==5003== by 0x42086B6: >> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi >> s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >>> ==5003== by 0x41FE293: >> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >>> ==5003== by 0x41FE406: >> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >>> ==5003== by 0x423BE41: >> lucene::index::IndexWriter::addDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*) (IndexWriter.cpp:668) >>> ==5003== by 0x8049331: main (testcl.cc:39) >>> ==5003== >>> ==5003== 31,425,328 bytes in 29,986 blocks are definitely lost in loss >> record 13 of 13 >>> ==5003== at 0x4025390: operator new(unsigned int) >> (vg_replace_malloc.c:214) >>> ==5003== by 0x41CCDC2: >> lucene::analysis::WhitespaceAnalyzer::tokenStream(wchar_t const*, >> lucene::util::Reader*) (Analyzers.cpp:113) >>> ==5003== by 0x41CC309: >> lucene::analysis::PerFieldAnalyzerWrapper::tokenStream(wchar_t const*, >> lucene::util::Reader*) (Analyzers.cpp:298) >>> ==5003== by 0x41CFFCE: >> lucene::analysis::Analyzer::reusableTokenStream(wchar_t const*, >> lucene::util::Reader*) (AnalysisHeader.cpp:36) >>> ==5003== by 0x4206228: >> lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene:: >> document::Field*, lucene::analysis::Analyzer*, int) >> (DocumentsWriterThreadState.cpp:889) >>> ==5003== by 0x42082A0: >> lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene: >> :analysis::Analyzer*) (DocumentsWriterThreadState.cpp:795) >>> ==5003== by 0x42086B6: >> lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysi >> s::Analyzer*) (DocumentsWriterThreadState.cpp:554) >>> ==5003== by 0x41FE293: >> lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*, lucene::index::Term*) (DocumentsWriter.cpp:934) >>> ==5003== by 0x41FE406: >> lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*) (DocumentsWriter.cpp:918) >>> ==5003== by 0x423BE41: >> lucene::index::IndexWriter::addDocument(lucene::document::Document*, >> lucene::analysis::Analyzer*) (IndexWriter.cpp:668) >>> ==5003== by 0x8049331: main (testcl.cc:39) >>> ==5003== >>> ==5003== LEAK SUMMARY: >>> ==5003== definitely lost: 31,825,360 bytes in 49,987 blocks >>> ==5003== indirectly lost: 222 bytes in 5 blocks >>> ==5003== possibly lost: 14,672 bytes in 14 blocks >>> ==5003== still reachable: 124 bytes in 4 blocks >>> ==5003== suppressed: 0 bytes in 0 blocks >>> ==5003== Reachable blocks (those to which a pointer was found) are not >> shown. >>> ==5003== To see them, rerun with: --leak-check=full >>> --show-reachable=yes ==5003== ==5003== For counts of detected and >>> suppressed errors, rerun with: -v ==5003== ERROR SUMMARY: 4 errors >>> from 4 contexts (suppressed: 27 from 8) >> Thanks for looking into this! >> >> >> Itamar Syn-Hershko wrote: >>> Hi, >>> >>> I ran TestAnalyzers.cpp (specifically testPerFieldAnalzyerWrapper() ) >>> from our test suite, and detected no leaks. I also tried replacing >>> >>> analyzer.addAnalyzer(_T("special"), _CLNEW SimpleAnalyzer()); >>> >>> With >>> >>> analyzer.addAnalyzer(_T("special"), _CLNEW StandardAnalyzer()); >>> >>> And still found nothing. >>> >>> I used our 2_3_2 master branch from the git repository (see >>> http://clucene.sourceforge.net/download.shtml). >>> >>> If you're using this branch, please let me know the details of the >>> leaks you're detecting. >>> >>> Itamar. >>> >>> -----Original Message----- >>> From: Michael Levin [mailto:mele...@stanford.edu] >>> Sent: Monday, November 02, 2009 8:47 PM >>> To: clucene-developers@lists.sourceforge.net >>> Subject: [CLucene-dev] PerFieldAnalyzerWrapper memory leak >>> >>> Hi, >>> >>> I am working on a program to index about 25gb of data and when I run >>> CLucene with a PerFieldAnalyzerWrapper it leaks memory and inevitably >>> crashes because it runs out of memory. >>> >>> Here is my code: >>> >>> lucene::analysis::PerFieldAnalyzerWrapper >>> analyzer(new lucene::analysis::standard::StandardAnalyzer()); >>> analyzer.addAnalyzer(_T("Authors"), >>> new lucene::analysis::WhitespaceAnalyzer()); >>> analyzer.addAnalyzer(_T("ReprintAuthor"), >>> new lucene::analysis::WhitespaceAnalyzer()); >>> analyzer.addAnalyzer(_T("Name"), >>> new lucene::analysis::WhitespaceAnalyzer()); >>> analyzer.addAnalyzer(_T("Email"), >>> new lucene::analysis::WhitespaceAnalyzer()); >>> >>> If I replace that snippet with a plain WhitespaceAnalyzer there is no >>> memory >>> leak: >>> >>> lucene::analysis::WhitespaceAnalyzer analyzer; >>> >>> Am I using the PerFieldAnalyzerWrapper class wrong or is this a bug in >>> CLucene? >>> >>> Thanks! -- Michael Levin <mele...@stanford.edu> ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers