Wonderful, and the tests (TestRussianStems) pass? Thanks, Uwe
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, July 06, 2009 5:37 PM > To: java-dev@lucene.apache.org > Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() > excessively in IndexReader and IndexWriter > > contrib/analyzers/src/test/org/apache/lucene/analysis/ru/stemsUTF8.txt > looks right on OpenSolaris (unix EOLs). > > Mike > > On Mon, Jul 6, 2009 at 9:53 AM, Uwe Schindler<u...@thetaphi.de> wrote: > > I fixed the encoding problem by convertig the test files to UTF-8 and > > changed the Reader charset parameter to UTF-8. All files now have old- > style > > native again. Could somebody check if in unix, the files only have LF > (and > > in windows the files have CRLF, which is the state how I committed it)? > > > > The overall strange/incorrect charset conversion is not touched at all, > but > > I strongly agree to remove it (and only keep UnicodeRussian as charset > > parmeter allowed to the analyzer) or remove the analyzer at all. > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > >> -----Original Message----- > >> From: Robert Muir [mailto:rcm...@gmail.com] > >> Sent: Monday, July 06, 2009 3:26 PM > >> To: java-dev@lucene.apache.org > >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() > >> excessively in IndexReader and IndexWriter > >> > >> uwe I completely agree. > >> > >> to add the icing on the cake the entire analyzer appears to be just a > >> duplication of the contrib/snowball Russian functionality...! > >> > >> On Mon, Jul 6, 2009 at 9:19 AM, Uwe Schindler<u...@thetaphi.de> wrote: > >> > The whole russian analyzer is very strange and works against all > >> > charset/unicode conventions. It defines own "charsets" (the only > valid > >> one > >> > is UNICODE), which are all applied to standard java 16 bit chars. The > >> test > >> > shows, how this works: It open a text file in KOI8 using the "ISO- > 88591- > >> 1" > >> > charset (just to not modify the codepoints when converting to 16bit > java > >> > chars (in principle it does a deprecated "new String(byte[],0)"). > These > >> > completely wrong java chars are then handled by an analyzers's > internal > >> > charset conversion (working on the 16 bit chars). > >> > > >> > The only correct usage of this package is: > >> > - open file with correct encoding (when instantiating the Reader, so > >> specify > >> > KOI8 or windows1251 to the Reader). The string is then correctly UTF- > 16 > >> > encoded java chars. On this string the "pseudo-charset" UNICODE of > this > >> > analyzer can be used. > >> > > >> > In my opinion, this invalid usage of java chars should be deprecated, > >> the > >> > only correct pseudo-charset should be the one specified by UNICODE > and > >> all > >> > charset conversions should be done using the Reader. > >> > > >> > Uwe > >> > > >> > ----- > >> > Uwe Schindler > >> > H.-H.-Meier-Allee 63, D-28213 Bremen > >> > http://www.thetaphi.de > >> > eMail: u...@thetaphi.de > >> > > >> >> -----Original Message----- > >> >> From: Robert Muir [mailto:rcm...@gmail.com] > >> >> Sent: Monday, July 06, 2009 3:08 PM > >> >> To: java-dev@lucene.apache.org > >> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() > >> >> excessively in IndexReader and IndexWriter > >> >> > >> >> Uwe, I think so too. This way it will not be prone to breakage > again. > >> >> > >> >> On Mon, Jul 6, 2009 at 8:38 AM, Uwe Schindler<u...@thetaphi.de> > wrote: > >> >> > In my opinion, these files should be converted to UTF-8 and > committed > >> >> again > >> >> > (and the Reader in the test recondigured for UTF-8). Then they can > be > >> >> native > >> >> > EOL style again. The problem is that SVN can only handle the EOL > >> style > >> >> for > >> >> > one-byte-per-char and UTF-8 files. > >> >> > > >> >> > I give it a try here (and I have a converter). > >> >> > > >> >> > ----- > >> >> > Uwe Schindler > >> >> > H.-H.-Meier-Allee 63, D-28213 Bremen > >> >> > http://www.thetaphi.de > >> >> > eMail: u...@thetaphi.de > >> >> > > >> >> >> -----Original Message----- > >> >> >> From: Robert Muir [mailto:rcm...@gmail.com] > >> >> >> Sent: Monday, July 06, 2009 1:11 PM > >> >> >> To: java-dev@lucene.apache.org > >> >> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use > ensureOpen() > >> >> >> excessively in IndexReader and IndexWriter > >> >> >> > >> >> >> yeah, its fixed now. > >> >> >> > >> >> >> On Mon, Jul 6, 2009 at 7:06 AM, Michael > >> >> >> McCandless<luc...@mikemccandless.com> wrote: > >> >> >> > Is this the native vs LF svn:eol-style that Uwe already fixed? > >> >> >> > > >> >> >> > Mike > >> >> >> > > >> >> >> > On Thu, Jul 2, 2009 at 10:03 AM, Shai Erera<ser...@gmail.com> > >> wrote: > >> >> >> >> Can somebody try to revert the change and test it on Windows? > >> >> >> >> > >> >> >> >> On Thu, Jul 2, 2009 at 4:44 PM, Robert Muir <rcm...@gmail.com> > >> >> wrote: > >> >> >> >>> > >> >> >> >>> well then I have no idea why it doesn't fail. Except that > >> perhaps > >> >> its > >> >> >> >>> EOL-related (as Shai said), and that the failure is somehow > >> >> >> >>> platform-dependent due to newline differences between windows > >> and > >> >> unix > >> >> >> >>> (and the way these are encoded in UTF-16/stored in SVN)? > >> >> >> >>> > >> >> >> >>> I don't do really any work with files in UTF-16 so this is > just > >> a > >> >> >> theory. > >> >> >> >>> > >> >> >> >>> On Thu, Jul 2, 2009 at 9:40 AM, Mark > >> Miller<markrmil...@gmail.com> > >> >> >> wrote: > >> >> >> >>> > Hudson runs all the tests and emails java-dev if any of > them > >> >> fail. > >> >> >> >>> > > >> >> >> >>> > On Thu, Jul 2, 2009 at 9:37 AM, Robert Muir (JIRA) > >> >> <j...@apache.org> > >> >> >> >>> > wrote: > >> >> >> >>> >> > >> >> >> >>> >> [ > >> >> >> >>> >> > >> >> >> >>> >> https://issues.apache.org/jira/browse/LUCENE- > >> >> >> > 1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > >> >> >> tabpanel&focusedCommentId=12726479#action_12726479 > >> >> >> >>> >> ] > >> >> >> >>> >> > >> >> >> >>> >> Robert Muir commented on LUCENE-1707: > >> >> >> >>> >> ------------------------------------- > >> >> >> >>> >> > >> >> >> >>> >> bq. Why doesn't Hudson encounter this problem? > >> >> >> >>> >> > >> >> >> >>> >> Forgive my ignorance, does hudson also run tests or just > >> verify > >> >> >> build? > >> >> >> >>> >> These files are only used in tests! > >> >> >> >>> >> > >> >> >> >>> >> I agree we should correct it, and perhaps to prevent other > >> >> problems > >> >> >> >>> >> these > >> >> >> >>> >> files should be converted to UTF-8. > >> >> >> >>> >> > >> >> >> >>> >> For the record I am still confused about these java-code > >> >> analyzers > >> >> >> that > >> >> >> >>> >> implement snowball algorithms, why do they exist when the > >> same > >> >> >> >>> >> functionality > >> >> >> >>> >> is in contrib/snowball? > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> >> > Don't use ensureOpen() excessively in IndexReader and > >> >> IndexWriter > >> >> >> >>> >> > -------------------------------------------------------- > --- > >> --- > >> >> --- > >> >> >> >>> >> > > >> >> >> >>> >> > Key: LUCENE-1707 > >> >> >> >>> >> > URL: > >> >> >> >>> >> > https://issues.apache.org/jira/browse/LUCENE-1707 > >> >> >> >>> >> > Project: Lucene - Java > >> >> >> >>> >> > Issue Type: Improvement > >> >> >> >>> >> > Components: Index > >> >> >> >>> >> > Reporter: Shai Erera > >> >> >> >>> >> > Fix For: 2.9 > >> >> >> >>> >> > > >> >> >> >>> >> > Attachments: LUCENE-1707.patch, LUCENE- > 1707.patch > >> >> >> >>> >> > > >> >> >> >>> >> > > >> >> >> >>> >> > A spin off from here: > >> >> >> >>> >> > http://www.nabble.com/Excessive-use-of-ensureOpen()- > >> >> >> td24127806.html. > >> >> >> >>> >> > We should stop calling this method when it's not > necessary > >> for > >> >> >> any > >> >> >> >>> >> > internal Lucene code. Currently, this code seems to hurt > >> >> properly > >> >> >> >>> >> > written > >> >> >> >>> >> > apps, unnecessarily. > >> >> >> >>> >> > Will post a patch soon > >> >> >> >>> >> > >> >> >> >>> >> -- > >> >> >> >>> >> This message is automatically generated by JIRA. > >> >> >> >>> >> - > >> >> >> >>> >> You can reply to this email to add a comment to the issue > >> >> online. > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> >> ---------------------------------------------------------- > --- > >> --- > >> >> --- > >> >> >> -- > >> >> >> >>> >> To unsubscribe, e-mail: java-dev- > >> unsubscr...@lucene.apache.org > >> >> >> >>> >> For additional commands, e-mail: java-dev- > >> h...@lucene.apache.org > >> >> >> >>> >> > >> >> >> >>> > > >> >> >> >>> > > >> >> >> >>> > > >> >> >> >>> > -- > >> >> >> >>> > -- > >> >> >> >>> > - Mark > >> >> >> >>> > > >> >> >> >>> > http://www.lucidimagination.com > >> >> >> >>> > > >> >> >> >>> > > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> -- > >> >> >> >>> Robert Muir > >> >> >> >>> rcm...@gmail.com > >> >> >> >>> > >> >> >> >>> ------------------------------------------------------------- > --- > >> --- > >> >> -- > >> >> >> >>> To unsubscribe, e-mail: java-dev- > unsubscr...@lucene.apache.org > >> >> >> >>> For additional commands, e-mail: java-dev- > h...@lucene.apache.org > >> >> >> >>> > >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> > --------------------------------------------------------------- > --- > >> --- > >> >> >> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> >> >> > For additional commands, e-mail: java-dev- > h...@lucene.apache.org > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Robert Muir > >> >> >> rcm...@gmail.com > >> >> >> > >> >> >> ----------------------------------------------------------------- > --- > >> - > >> >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> >> > > >> >> > > >> >> > > >> >> > ------------------------------------------------------------------ > --- > >> >> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> >> > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Robert Muir > >> >> rcm...@gmail.com > >> >> > >> >> -------------------------------------------------------------------- > - > >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > > >> > > >> > >> > >> > >> -- > >> Robert Muir > >> rcm...@gmail.com > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org