Thank you both too for reporting such important bugs and leaks... This is why this project is open source, and I wish there were many more ppl like you. Hopefully you'll get familiar enough with the code to take part in actual development and optimization...
I will have a deeper look at the patch you sent as soon as I can. It looks as if it needs a bit of tweaking, but it definitely helps a lot. Thanks. Itamar. -----Original Message----- From: Michael Levin [mailto:mele...@stanford.edu] Sent: Wednesday, November 11, 2009 5:34 AM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] Cannot write >2gb index file Itamar, First of all, thanks for fixing StandardAnalyzer! The zero norm issues are gone. I have to agree with celtix44, you are indeed a legend. :) I'm attaching a patch against HEAD that will add large file support for 32-bit Unix systems using #define hacks. Windows is going to be more difficult but at least CLucene can support large files on Unix easily. I also wrote a test for IndexWriter but I couldn't get cl_test to compile (to link to be exact-- lots of linker errors when I do "make cl_test"). The test will attempt to create a huge index in the data/bigIndex directory. If the resulting index is 2.1gb in size that is bad-- we hit the 2^31 ceiling. If it creates an index file bigger than that then things are good. Apologies if the changes in the patch are stylistically out of place with CLucene's order of things but I couldn't think up a better way to do it. I don't know CMake but perhaps the defines should be emitted by CMake on 32-bit systems only? Itamar Syn-Hershko wrote: > Michael, > > Please update your code, I just committed a fix for the bug you > reported (commit c89f8a39fa1faa34374d8a6e92ae9c2467deeda7). Please > test this with your code as well. > > With regards to the 64bit FS issue, it would be nice if you could > provide a test and a fix for this (using some #define hacks or our > cmake scripts). I'm just so swamped at the moment that I'm afraid I > won't be able to do this myself anytime soon. I can provide pointers if necessary. > > Itamar. > > -----Original Message----- > From: Michael Levin [mailto:mele...@stanford.edu] > Sent: Thursday, November 05, 2009 10:07 PM > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] Cannot write >2gb index file > > Itamar Syn-Hershko wrote: >> Michael, >> >> Thanks. However, the O_LARGEFILE flag isn't supported on Windows (for >> all versions as far as I can tell), and might not be supported on >> other Linux distributions, and on Mac. That being said, this is >> something we need to test and find a solution for (probably another >> cmake check). I'm no cross-platform wiz, so anyone willing to take >> this up > please be my guest. > > Thanks for looking into this. > > It's true that Windows doesn't support this flag. I believe on Windows > you don't have the open64 style functions either so you must use the > Windows API equivalents (e.g. CreateFile()). I can see how > inconvenient this can get though as you probably won't be able to get > away with a platform-agnostic > _cl_open() function... > >> On that note, I haven't tested your code to see if it crashes on >> Windows as well. Might be interesting to see tho. > > I believe it should though definitely something worth testing. > >> I see no reason why this will break StandardAnalyzer. Can you provide >> more details please? > > Honestly I don't know why it would either. It may not have been my > changes actually, cel tix44 just sent out an email saying the last two > commits broke the StandardAnalyzer ("[CLucene-dev] StandardAnalyzer > broken - GIT 364c21b6c3f54fbb90df223621b660197366fb93"). I was using git > to switch from my branch to head and I thought that the > StandardAnalyzer was working in HEAD though I may have made a mistake... > > The exact problem is missing norms. When I generate a new file and > open it in Luke or query with CLucene only the first term processed > with StandardAnalyzer has a norm (of 1.0) and every other term has > zero norm and won't appear in search results. > > I am currently using StopAnalyzer and it works fine so I wonder if the > problem is somewhere in StandardFilter? > >> Itamar. >> >> -----Original Message----- >> From: Michael Levin [mailto:mele...@stanford.edu] >> Sent: Thursday, November 05, 2009 11:55 AM >> To: clucene-developers@lists.sourceforge.net >> Subject: Re: [CLucene-dev] Cannot write >2gb index file >> >> (Sorry for the email spam...) >> >> This change seems to break StandardAnalyzer though. I can't figure >> out why... all of the other analyzers work fine. :-\ >> >> Michael Levin wrote: >>> Really easy fix, please add "O_LARGEFILE" flag everywhere _cl_open() >>> is used. E.g.: >>> >>> _cl_open(buffer, O_RDWR, _S_IWRITE) --> >>> _cl_open(buffer, O_RDWR | O_LARGEFILE, _S_IWRITE) >>> >>> The required header define is already defined in config files and >>> adding this flag shouldn't affect 64-bit machines in any way. Thanks! -- Michael Levin <mele...@stanford.edu> ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers