Thank you both too for reporting such important bugs and leaks... This is
why this project is open source, and I wish there were many more ppl like
you. Hopefully you'll get familiar enough with the code to take part in
actual development and optimization...

I will have a deeper look at the patch you sent as soon as I can. It looks
as if it needs a bit of tweaking, but it definitely helps a lot. Thanks.

Itamar. 

-----Original Message-----
From: Michael Levin [mailto:mele...@stanford.edu] 
Sent: Wednesday, November 11, 2009 5:34 AM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] Cannot write >2gb index file

Itamar,

First of all, thanks for fixing StandardAnalyzer! The zero norm issues are
gone. I have to agree with celtix44, you are indeed a legend. :)

I'm attaching a patch against HEAD that will add large file support for
32-bit Unix systems using #define hacks. Windows is going to be more
difficult but at least CLucene can support large files on Unix easily.

I also wrote a test for IndexWriter but I couldn't get cl_test to compile
(to link to be exact-- lots of linker errors when I do "make cl_test"). The
test will attempt to create a huge index in the data/bigIndex directory. If
the resulting index is 2.1gb in size that is
bad-- we hit the 2^31 ceiling. If it creates an index file bigger than that
then things are good.

Apologies if the changes in the patch are stylistically out of place with
CLucene's order of things but I couldn't think up a better way to do it. I
don't know CMake but perhaps the defines should be emitted by CMake on
32-bit systems only?

Itamar Syn-Hershko wrote:
> Michael,
> 
> Please update your code, I just committed a fix for the bug you 
> reported (commit c89f8a39fa1faa34374d8a6e92ae9c2467deeda7). Please 
> test this with your code as well.
> 
> With regards to the 64bit FS issue, it would be nice if you could 
> provide a test and a fix for this (using some #define hacks or our 
> cmake scripts). I'm just so swamped at the moment that I'm afraid I 
> won't be able to do this myself anytime soon. I can provide pointers if
necessary.
> 
> Itamar. 
> 
> -----Original Message-----
> From: Michael Levin [mailto:mele...@stanford.edu]
> Sent: Thursday, November 05, 2009 10:07 PM
> To: clucene-developers@lists.sourceforge.net
> Subject: Re: [CLucene-dev] Cannot write >2gb index file
> 
> Itamar Syn-Hershko wrote:
>> Michael,
>>
>> Thanks. However, the O_LARGEFILE flag isn't supported on Windows (for 
>> all versions as far as I can tell), and might not be supported on 
>> other Linux distributions, and on Mac. That being said, this is 
>> something we need to test and find a solution for (probably another 
>> cmake check). I'm no cross-platform wiz, so anyone willing to take 
>> this up
> please be my guest.
> 
> Thanks for looking into this.
> 
> It's true that Windows doesn't support this flag. I believe on Windows 
> you don't have the open64 style functions either so you must use the 
> Windows API equivalents (e.g. CreateFile()). I can see how 
> inconvenient this can get though as you probably won't be able to get 
> away with a platform-agnostic
> _cl_open() function...
> 
>> On that note, I haven't tested your code to see if it crashes on 
>> Windows as well. Might be interesting to see tho.
> 
> I believe it should though definitely something worth testing.
> 
>> I see no reason why this will break StandardAnalyzer. Can you provide 
>> more details please?
> 
> Honestly I don't know why it would either. It may not have been my 
> changes actually, cel tix44 just sent out an email saying the last two 
> commits broke the StandardAnalyzer ("[CLucene-dev] StandardAnalyzer
> broken - GIT  364c21b6c3f54fbb90df223621b660197366fb93"). I was using git 
> to switch from my branch to head and I thought that the 
> StandardAnalyzer was working in HEAD though I may have made a mistake...
> 
> The exact problem is missing norms. When I generate a new file and 
> open it in Luke or query with CLucene only the first term processed 
> with StandardAnalyzer has a norm (of 1.0) and every other term has 
> zero norm and won't appear in search results.
> 
> I am currently using StopAnalyzer and it works fine so I wonder if the 
> problem is somewhere in StandardFilter?
> 
>> Itamar. 
>>
>> -----Original Message-----
>> From: Michael Levin [mailto:mele...@stanford.edu]
>> Sent: Thursday, November 05, 2009 11:55 AM
>> To: clucene-developers@lists.sourceforge.net
>> Subject: Re: [CLucene-dev] Cannot write >2gb index file
>>
>> (Sorry for the email spam...)
>>
>> This change seems to break StandardAnalyzer though. I can't figure 
>> out why... all of the other analyzers work fine. :-\
>>
>> Michael Levin wrote:
>>> Really easy fix, please add "O_LARGEFILE" flag everywhere _cl_open() 
>>> is used. E.g.:
>>>
>>>    _cl_open(buffer, O_RDWR, _S_IWRITE) -->
>>>    _cl_open(buffer, O_RDWR | O_LARGEFILE, _S_IWRITE)
>>>
>>> The required header define is already defined in config files and 
>>> adding this flag shouldn't affect 64-bit machines in any way. Thanks!

--
Michael Levin <mele...@stanford.edu>



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to