I think the regenerated code in Standard is since years no longer generated with 1.4 :-) Most developers use 1.5 or even 1.6. So it already changed incompatible.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _____ From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, November 16, 2009 8:52 PM To: java-dev@lucene.apache.org Subject: Re: Why release 3.0? Uwe, thats probably a good solution I think. just as long as we document somewhere, I think there is some warning verbage in StandardTokenizer already about this. NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate the tokenizer, remember to use JRE 1.4 to run jflex (before Lucene 3.0). This grammar now uses constructs (eg :digit:, :letter:) whose meaning can vary according to the JRE used to run jflex. See https://issues.apache.org/jira/browse/LUCENE-1126 for details. On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <u...@thetaphi.de> wrote: But it is a general warning that should be placed in the Wiki: If you upgrade from Java 1.4 to Java 5, think about reindexing. It has definitely nothing to do with 3.0, because uses could have changed (and most of them have) before. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _____ From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, November 16, 2009 8:45 PM To: java-dev@lucene.apache.org Subject: Re: Why release 3.0? right, my point is its true its nothing to do with Lucene at all, really. but the reality is we should clarify this to users I think. Its especially complex in the current StandardTokenizer, which uses a mix of hardcoded ranges and properties, can you tell me if you should reindex for given language X? I wouldn't want to answer that question right now. On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <u...@thetaphi.de> wrote: We tried out: Character.getType() for these two chars: Java 5: '\u00AD' = 16 '\u06DD' = 16 Java 1.4: '\u00AD' = 20 '\u06DD' = 7 The first is the soft hyphen. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _____ From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, November 16, 2009 8:37 PM To: java-dev@lucene.apache.org Subject: Re: Why release 3.0? right, its nothing to do with lucene, instead due to property changes, etc. i just think we should inform users on java 1.4/2.9 that if they upgrade to java 1.5/3.0, they should reindex. the reason i say this about properties, is there are some that change that will affect tokenizers, i give two examples, a hyphen that changes from punctuation to format (might affect SolrWordDelimiterFilter), and arabic ayah which changes from NSM to format, which surely affects ArabicLetterTokenizer. On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sar...@syr.edu> wrote: Hi Robert, I agree that the Unicode version supported by the JVM, as you say, really has nothing to do with Lucene. The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they upgrade Lucene. I'd guess with few exceptions that most people have been using Lucene with 1.5+ for a couple of years now, though. But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on most Lucene users, assuming that most use Latin-1 exclusively; although I haven't looked, I'd be surprised if Latin-1 characters changed much, if at all, from Unicode 3.0 to 4.0. It would be useful, I think, to include (a pointer to?) a description of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes, since the minimum required Java version, and so also the supported Unicode version, changes then. Steve On 11/16/2009 at 2:15 PM, Robert Muir wrote: > the problem is that the properties have changed for various characters, > and new characters were added. > > it really has nothing to do with lucene, but the idea you can go from > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true. > > > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > > But an UTF-8 stream from Java 4 can still be read with Java 5, > what is the problem? Java 5 extended Unicode support, but an index > created with older versions can still be read. UTF-8 is standardized. > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > ________________________________ > > > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Monday, November 16, 2009 8:09 PM > > To: java-dev@lucene.apache.org > Subject: Re: Why release 3.0? > > > > uwe, on topic please read my comment on LUCENE-1689, because > unicode version was bumped in jdk 1.5, i believe this index backwards > compatibility is only theoretical > > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > 2.9 has *not* the same format as 3.0, an index created with 3.0 > cannot be read with 2.9. This is because compressed field support was > removed and therefore the version number of the stored fields file was > upgraded. But indexes from 2.9 can be read with 3.0 and support may get > removed in 4.0. 3.0 Indexes can be read until version 4.9. > > > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > ________________________________ > > > From: Jake Mannix [mailto:jake.man...@gmail.com] > Sent: Monday, November 16, 2009 7:15 PM > > > To: java-dev@lucene.apache.org > > Subject: Re: Why release 3.0? > > > > Don't users need to upgrade to 3.0 because 3.1 won't be > necessarily able to read your > 2.4 index file formats? I suppose if you've already upgraded to > 2.9, then all is well because > 2.9 is the same format as 3.0, but we can't assume all users > upgraded from 2.4 to 2.9. > > If you've done that already, then 3.0 might not be necessary, > but if you're on 2.4 right now, > you will be in for a bad surprise if you try to upgrade to 3.1. > > -jake > > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson > <erickerick...@gmail.com> wrote: > > One of my "specialties" is asking obvious questions just to see > if everyone's assumptions are aligned. So with the discussion about > branching 3.0 I have to ask "Is there going to be any 3.0 release > intended for *production*?". And if not, would we save a lot of > work by just not worrying about retrofitting fixes to a 3.0 branch > and carrying on with 3.1 as the first *supported* 3.x release? > > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not > sure *as a user* I see a good reason to upgrade to 3.0. Getting a > "beta/snapshot" release to get a head start on cleaning up my code > does seem worthwhile, if I have the spare time. And having a base > 3.0 version that's not changing all over the place would be useful > for that. > > That said, I'm also not terribly comfortable with a "release" > that's out there and unsupported. > > Apologies if this has already been discussed, but I don't > remember it. Although my memory isn't what it used to be (but > some would claim it never was<G>)... > > Erick -- Robert Muir rcm...@gmail.com -- Robert Muir rcm...@gmail.com -- Robert Muir rcm...@gmail.com