> Just my thoughts on the subject. DBera: are you saying that you want > to just work/look into the language stemming, or both the language > stemming and the text cache? Depending on what you want to work on, I > can help out with this, if its something we really want to see in > 0.3.0. Lemme know.
1. I definitely don't have the time, lest it would have been done by now :) 2. I will locate Arun's patch and send it out; its a good implementation and can acts a reference. 3. The problem is less on the number of queries. It is more about sending the data to textcache (which can either store it gzipped in sqlite or gzipped on disk), and to the language determination class and to lucene without (repeat:without) storing all the data in a huge store/string in memory. I thought a cutoff size of disk_block_size would be a good starting point, it will reduce external fragmentation to a good degree since most textcache files are less than 1 block. So the decision to store on disk or in sqlite can only come after we have read, say 4KB of data. The language determination, I think, requires 1K of text. In our filter/lucene interface, lucene asks for data and then filters go and extract little more data from the file and send it back: this goes in loop till there is no more data to extract. There is no storing of data in the memory! So to do the whole thing correctly, as lucene asks for more data the filters return the data and transparently someone in the middle decides whether to store the data in sqlite or disk (and does so); furthermore, even before lucene asks for data, about 1K of data is extracted from the file, language detected and appropriate stemmer hooked and the data is kept around till lucene asks for it. The obvious approach is by extracting all the data in advance, storing it in memory, deciding where to store textcache, deciding the language and then comfortably feeding lucene from the stored data. Thats not desired. I hope you also see where the connection between language determination and text-cache comes in. Go for them if you or anyone wants to. Just let others know so there is no duplication in effort. N. Lets not target a release and cram features in :) Instead if you want to work on something, work on it. If it is done and release-ready by 0.3, it will be included. Otherwise there is always another release. There is little sense if including lots of half-complete, pooly implemented features just to make the release notes look yummy :-) Of course I am restating the obvious. (*) - dBera (*) When I sent out a to-come feature list in one of my earlier emails, I was more stressing the fact that testing is becoming very important and difficult with all these different features and less on the fact that "Wow! Now we can do XXX too". Now I think I was misread. -- ----------------------------------------------------- Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user _______________________________________________ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers