RE: Why release 3.0?

Uwe Schindler Mon, 16 Nov 2009 12:00:13 -0800

I think the regenerated code in Standard is since years no longer generated
with 1.4 :-) Most developers use 1.5 or even 1.6. So it already changed
incompatible.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Robert Muir [mailto:[email protected]] 
Sent: Monday, November 16, 2009 8:52 PM
To: [email protected]
Subject: Re: Why release 3.0?

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
      the tokenizer, remember to use JRE 1.4 to run jflex (before
      Lucene 3.0).  This grammar now uses constructs (eg :digit:,
      :letter:) whose meaning can vary according to the JRE used to
      run jflex.  See
      https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <[email protected]> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.

It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Robert Muir [mailto:[email protected]] 
Sent: Monday, November 16, 2009 8:45 PM

To: [email protected]
Subject: Re: Why release 3.0?

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think. 

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <[email protected]> wrote:

We tried out: Character.getType() for these two chars:

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Robert Muir [mailto:[email protected]] 
Sent: Monday, November 16, 2009 8:37 PM

To: [email protected]
Subject: Re: Why release 3.0?

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <[email protected]> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <[email protected]> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: [email protected]
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:[email protected]]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: [email protected]
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <[email protected]>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: [email protected]
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:[email protected]]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: [email protected]
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <[email protected]> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
[email protected]

-- 
Robert Muir
[email protected]

-- 
Robert Muir
[email protected]

RE: Why release 3.0?

Reply via email to