I had a quick look. Seems like cl_demo is indeed broken (following a some work Ben made on it a year ago), and we are working on it. As I said before, try using CLucene for your needs as-is, and let us know if you hit any walls.
The UTF8 test fails because of src\test\data\utf8text\french_utf8.txt. I can't seem to commit it with -crlf -diff, so an extra LF is added and that breaks the UTF8 code. This can be a code issue as well, but is less likely. Itamar. > -----Original Message----- > From: Klemens Friedl [mailto:fri...@gmail.com] > Sent: Sunday, June 13, 2010 6:21 PM > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] clucene - cl_demo stops with error > > I tried to execute the cl_demo in both versions with an > reduced reuters corpa; see difference in the files: > files-reuters_*.txt > > the branch cl_demo version worked fine with less files, weird: > branch_cl_demo_short.txt > > the master version does work only with very few files, with a > bit more it shows another error: > cl_demo_short_*.txt > > > Klemens > > > 2010/6/13 Klemens Friedl <fri...@gmail.com>: > > cl_test apps run, but one test fails in both, and both > versions run a > > different amount of tests. > > > > cl_test (master) runs 97 tests, one of the two UTF8 tests > failed, see > > "cl_test.txt" file (attached last email). > > cl_test (atomicthreads) runs 102 tests, also one of the two > UTF8 tests > > failed, see "branch_cl_test.txt" file. > > > > > > cl_demo crashes in both. yesterday, I tried to test cl_demo > with only > > circa half of the documents of the reuters test directory, > and it run > > through fine. I played a bit around and it seems that > cl_demo crashes > > while indexing text files with a few kilobytes (files that > are a bit > > larger than the smallest text files in the directory). The index > > merging and optimizing process takes unusally (in my opinion) long > > time, as the index files are combined maybe a megabyte of > disc space. > > weird. > > > > > > 2010/6/13 Itamar Syn-Hershko <ita...@divrei-tora.com>: > >> Just to confirm: for both branches, cl_test works fine but > cl_demo crashes? > >> > >>> -----Original Message----- > >>> From: Klemens Friedl [mailto:fri...@gmail.com] > >>> Sent: Sunday, June 13, 2010 5:37 PM > >>> To: clucene-developers@lists.sourceforge.net > >>> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error > >>> > >>> I build and executed cl_test and cl_demo again on master and the > >>> atomicthreads branch with default cmake settings, see attached > >>> files. > >>> I included a stack trace for the cl_demo app in both cases. > >>> > >>> Klemens > >>> > >>> > >>> 2010/6/13 Itamar Syn-Hershko <ita...@divrei-tora.com>: > >>> > Can you please test the master branch (cl_test and > cl_demo) with > >>> > default cmake settings as well? > >>> > > >>> > Also, can you send the stacktrace for this deadlock? If you > >>> get this > >>> > on master, then for master, otherwise for atomicthreads. > >>> > > >>> > Itamar. > >>> > > >>> >> -----Original Message----- > >>> >> From: Klemens Friedl [mailto:fri...@gmail.com] > >>> >> Sent: Sunday, June 13, 2010 11:16 AM > >>> >> To: clucene-developers@lists.sourceforge.net > >>> >> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error > >>> >> > >>> >> I tried the cl_test and cl_demo with the atomicthreads > branch and > >>> >> default cmake settings (except added zlib path vars). > >>> >> (see attached log files) > >>> >> > >>> >> cl_test runs through 102 tests, but fails on first of two > >>> UTF8 tests. > >>> >> cl_demo indexes all files of the reuters corpa, though it > >>> deadlocks > >>> >> right after that :/ > >>> >> > >>> >> > >>> >> Kind regards, > >>> >> Klemens Friedl > >>> >> > >>> >> > >>> >> > >>> >> > F:\Home\Search\clucene\atomicthreads\build\bin\Debug>cl_test.exe > >>> >> Key: .= pass N=not implemented F=fail All CLucene Tests: > >>> >> CLucene Atomic Updates Test: .. - 6203ms > >>> >> CLucene IndexReader Test: .. - 766ms > >>> >> CLucene Reuters Test: ... - 8547ms > >>> >> CLucene Analysis Test: . - 0ms > >>> >> CLucene Analyzers Test: ......... - 234ms > >>> >> CLucene Document Test: ...... - 4563ms > >>> >> CLucene Number Tools Test: ... - 422ms > >>> >> CLucene Debug Test: . - 0ms > >>> >> CLucene IndexWriter Test: ...... - 4281ms > >>> >> CLucene IndexModifier Test: . - 56047ms > >>> >> CLucene High Frequencies Test: . - 16ms > >>> >> CLucene Priority Queue Test: . - 62ms > >>> >> CLucene DateTools Test: .. - 0ms > >>> >> CLucene Query Parser Test: ............... - 63ms > >>> >> CLucene Multi-Field QP Test: .. - 0ms > >>> >> CLucene Boolean Tests: .... - 15ms > >>> >> CLucene Search Test: .............. - 609ms > >>> >> CLucene Queries Test: .. - 16ms > >>> >> CLucene Term Vector Test: ..... - 78ms > >>> >> CLucene Sort Test: ........... - 79ms > >>> >> CLucene Duplicates Test: .. - 125ms > >>> >> CLucene DateFilter Test: ... - 78ms > >>> >> CLucene Wildcard Test: .. - 0ms > >>> >> CLucene Store Test: .. - 297ms > >>> >> CLucene UTF8 Test: F. - 187ms > >>> >> > >>> >> 102 tests run: 101 passed, 1 failed, 0 not implemented. > >>> >> > >>> >> Tests run in 82843ms > >>> >> > >>> >> WARNING: stringPool still contains intern'd strings > (refcounts): > >>> >> contents (10) > >>> >> field1 (5) > >>> >> field2 (5) > >>> >> field3 (5) > >>> >> field4 (5) > >>> >> id (4) > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > F:\Home\Search\clucene\atomicthreads\build\bin\Debug>cl_demo.exe > >>> >> adding file 1: > >>> >> ..\src\test\data\reuters-21578/all-exchanges-strings.lc.txt > >>> >> adding file 2: > >>> ..\src\test\data\reuters-21578/all-orgs-strings.lc.txt > >>> >> adding file 3: > >>> >> ..\src\test\data\reuters-21578/all-people-strings.lc.txt > >>> >> adding file 4: > >>> >> ..\src\test\data\reuters-21578/all-places-strings.lc.txt > >>> >> adding file 5: > >>> >> ..\src\test\data\reuters-21578/all-topics-strings.lc.txt > >>> >> adding file 6: > >>> >> ..\src\test\data\reuters-21578/cat-descriptions_120396.txt > >>> >> adding file 7: > >>> >> > ..\src\test\data\reuters-21578/feldman-cia-worldfactbook-data.txt > >>> >> > >>> >> adding file 8: ..\src\test\data\reuters-21578/LEWIS.DTD > >>> >> adding file 9: ..\src\test\data\reuters-21578/README.TXT > >>> >> adding file 10: ..\src\test\data\reuters-21578/reut2-000.sgm > >>> >> adding file 11: ..\src\test\data\reuters-21578/reut2-001.sgm > >>> >> adding file 12: ..\src\test\data\reuters-21578/reut2-002.sgm > >>> >> adding file 13: ..\src\test\data\reuters-21578/reut2-003.sgm > >>> >> adding file 14: ..\src\test\data\reuters-21578/reut2-004.sgm > >>> >> adding file 15: ..\src\test\data\reuters-21578/reut2-005.sgm > >>> >> adding file 16: ..\src\test\data\reuters-21578/reut2-006.sgm > >>> >> adding file 17: ..\src\test\data\reuters-21578/reut2-007.sgm > >>> >> adding file 18: ..\src\test\data\reuters-21578/reut2-008.sgm > >>> >> adding file 19: ..\src\test\data\reuters-21578/reut2-009.sgm > >>> >> adding file 20: ..\src\test\data\reuters-21578/reut2-010.sgm > >>> >> adding file 21: ..\src\test\data\reuters-21578/reut2-011.sgm > >>> >> adding file 22: ..\src\test\data\reuters-21578/reut2-012.sgm > >>> >> adding file 23: ..\src\test\data\reuters-21578/reut2-013.sgm > >>> >> adding file 24: ..\src\test\data\reuters-21578/reut2-014.sgm > >>> >> adding file 25: ..\src\test\data\reuters-21578/reut2-015.sgm > >>> >> adding file 26: ..\src\test\data\reuters-21578/reut2-016.sgm > >>> >> adding file 27: ..\src\test\data\reuters-21578/reut2-017.sgm > >>> >> adding file 28: ..\src\test\data\reuters-21578/reut2-018.sgm > >>> >> adding file 29: ..\src\test\data\reuters-21578/reut2-019.sgm > >>> >> adding file 30: ..\src\test\data\reuters-21578/reut2-020.sgm > >>> >> adding file 31: ..\src\test\data\reuters-21578/reut2-021.sgm > >>> >> > >>> >> > >>> >> Debug Assertion Failed! > >>> >> Expression: _BLOCK_TYPE_IS_VALID(pHead->nBlockUse) > >>> >> > >>> >> VS 2008 debugger reports a deadlock in: > >>> >> atomicthreads\clucene\src\core\CLucene\util\Array.h (line 139) > >>> >> > >>> >> > >>> >> > >>> >> 2010/6/12 Klemens Friedl <fri...@gmail.com>: > >>> >> > I forgot to mention that I ran the cl_test app > earlier today, > >>> >> > it stopped with an failure at test 97. > >>> >> > (although, I may have used slightly different cmake settings) > >>> >> > > >>> >> > I will try out that branch tomorrow, as it's already > late there. > >>> >> > > >>> >> > Klemens > >>> >> > > >>> >> > > >>> >> > 2010/6/12 Itamar Syn-Hershko <ita...@divrei-tora.com>: > >>> >> >> I'm running cl_test on a similar environment without > >>> any problem > >>> >> >> (using the default CMake config). One of the tests there > >>> >> indexes the > >>> >> >> reuters corpus too. Can you try running that? > >>> >> >> > >>> >> >> The actual error looks like something we fixed in the > >>> >> atomicthreads > >>> >> >> branch, and wasn't merged into master yet due to lack > >>> of feedback. > >>> >> >> Can you try running demo from that branch (after trying > >>> >> cl_test too)? > >>> >> >> > >>> >> >> Itamar. > >>> >> >> > >>> >> >>> -----Original Message----- > >>> >> >>> From: Klemens Friedl [mailto:fri...@gmail.com] > >>> >> >>> Sent: Saturday, June 12, 2010 10:19 PM > >>> >> >>> To: clucene-developers@lists.sourceforge.net > >>> >> >>> Subject: [CLucene-dev] clucene - cl_demo stops with error > >>> >> >>> > >>> >> >>> clucene - cl_demo stops with error while indexing > >>> reuters corpus > >>> >> >>> > >>> >> >>> clucene version: current git current master > >>> >> >>> platform: WinXP SP3 > >>> >> >>> build system: VS 2008 SP1 > >>> >> >>> cmake: 2.8.1 > >>> >> >>> cmake settings: see cmakecache.txt file (attached to email) > >>> >> >>> > >>> >> >>> > >>> >> >>> cl_demo app stops with error: > >>> >> >>> (one code line changed only to meet path to reuters-21578 > >>> >> >>> corpa > >>> >> >>> directory) > >>> >> >>> > >>> >> >>> > >>> >> >>> F:\Home\Search\clucene\build\bin\Debug>cl_demo.exe > >>> >> >>> adding file 1: > >>> >> >>> src\test\data\reuters-21578/all-exchanges-strings.lc.txt > >>> >> >>> adding file 2: > >>> >> >>> src\test\data\reuters-21578/all-orgs-strings.lc.txt > >>> >> >>> adding file 3: > >>> >> src\test\data\reuters-21578/all-people-strings.lc.txt > >>> >> >>> adding file 4: > >>> >> src\test\data\reuters-21578/all-places-strings.lc.txt > >>> >> >>> adding file 5: > >>> >> src\test\data\reuters-21578/all-topics-strings.lc.txt > >>> >> >>> adding file 6: > >>> >> >>> src\test\data\reuters-21578/cat-descriptions_120396.txt > >>> >> >>> adding file 7: > >>> >> >>> > src\test\data\reuters-21578/feldman-cia-worldfactbook-data.tx > >>> >> >>> t adding file 8: src\test\data\reuters-21578/LEWIS.DTD > >>> >> >>> adding file 9: src\test\data\reuters-21578/README.TXT > >>> >> >>> adding file 10: src\test\data\reuters-21578/reut2-000.sgm > >>> >> >>> adding file 11: src\test\data\reuters-21578/reut2-001.sgm > >>> >> >>> > >>> >> >>> => VS 2008 SP1 debugger: > >>> >> >>> Unhandled exception at 0x10099e4f (clucene-cored.dll) in > >>> >> cl_demo.exe: > >>> >> >>> 0xC0000005: Access violation writing location 0x01034f74. > >>> >> >>> > >>> >> >>> file: > >>> >> >>> > clucene\src\core\CLucene\index\DocumentsWriterThreadState.cpp > >>> >> >>> (line 642) > >>> >> >>> > >>> >> >>> > >>> >> >>> the lucene index file, (output from "dir" command): > >>> >> >>> > >>> >> >>> F:\Home\Search\clucene\build\bin\Debug\data>dir > >>> >> >>> Verzeichnis von > F:\Home\Search\clucene\build\bin\Debug\data > >>> >> >>> > >>> >> >>> 12.06.2010 20:56 <DIR> . > >>> >> >>> 12.06.2010 20:56 <DIR> .. > >>> >> >>> 12.06.2010 20:56 20 segments.gen > >>> >> 12.06.2010 20:56 > >>> >> >>> 45 segments_3 12.06.2010 20:56 0 > write.lock > >>> >> >>> 12.06.2010 20:56 536.020 _0.cfs 12.06.2010 > >>> >> 20:58 > >>> >> >>> 114.688 _1.fdt 12.06.2010 20:56 0 _1.fdx > >>> >> >>> 6 Datei(en) 650.773 Bytes > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> If I remove half of the reuters-21578 corpa files of the > >>> >> >>> corpa directory, the cl_demo runs through fine !! > >>> >> >>> > >>> >> >>> > >>> >> >>> I tried various settings with cmake. I am using > >>> >> gnuwin32's zlib. I > >>> >> >>> am not using iconv - as it appeared to me as optional > >>> component. > >>> >> >>> What are the prefered and tested cmake settings > for a common > >>> >> >>> environment? > >>> >> >>> I need unicode support, multithreading would be a nice to > >>> >> have, if > >>> >> >>> possible i would like to avoid iconv. > >>> >> >>> > >>> >> >>> > >>> >> >>> Kind regards, > >>> >> >>> Klemens Friedl > >>> >> >>> > >>> >> >>> > >>> >> >>> btw. > >>> >> >>> the _LUCENE_THREAD_FUNC(atomicIndexTest, _writer) and > >>> >> >>> _LUCENE_THREAD_FUNC(atomicSearchTest, _directory) may > >>> >> need a return > >>> >> >>> statement, as VS informed me, while testing other > >>> cmake settings. > >>> >> >>> file: clucene\src\test\index\TestThreading.cpp > (line 18, 51) > >>> >> >>> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> > >>> > -------------------------------------------------------------------- > >>> - > >>> >> >> --------- ThinkGeek and WIRED's GeekDad team up for the > >>> Ultimate > >>> >> >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to > the lucky > >>> >> >> parental unit. See the prize list and enter to win: > >>> >> >> http://p.sf.net/sfu/thinkgeek-promo > >>> >> >> _______________________________________________ > >>> >> >> CLucene-developers mailing list > >>> >> >> CLucene-developers@lists.sourceforge.net > >>> >> >> > https://lists.sourceforge.net/lists/listinfo/clucene-developer > >>> >> >> s > >>> >> >> > >>> >> > > >>> >> > >>> > > >>> > > >>> > > >>> > > >>> > -------------------------------------------------------------------- > >>> -- > >>> > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > >>> > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > >>> lucky parental > >>> > unit. See the prize list and enter to win: > >>> > http://p.sf.net/sfu/thinkgeek-promo > >>> > _______________________________________________ > >>> > CLucene-developers mailing list > >>> > CLucene-developers@lists.sourceforge.net > >>> > https://lists.sourceforge.net/lists/listinfo/clucene-developers > >>> > > >>> > >> > >> > >> > >> > --------------------------------------------------------------------- > >> --------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > >> parental unit. See the prize list and enter to win: > >> http://p.sf.net/sfu/thinkgeek-promo > >> _______________________________________________ > >> CLucene-developers mailing list > >> CLucene-developers@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > >> > > > ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers