[lucenenet] 01/05: Lucene.Net: Addeed migration-guide.md (originally MIGRATE.txt). This is the original text from Lucene 4.8.1 without changes.

nightowl888 Tue, 23 Mar 2021 20:34:39 -0700

This is an automated email from the ASF dual-hosted git repository.

nightowl888 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/lucenenet.git


commit b1212322cb43ca17b66ff5bfbba9eaf987c64829
Author: Shad Storhaug <[email protected]>
AuthorDate: Wed Mar 24 02:30:05 2021 +0700

    Lucene.Net: Addeed migration-guide.md (originally MIGRATE.txt). This is the 
original text from Lucene 4.8.1 without changes.
---
 src/Lucene.Net/migration-guide.md | 641 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 641 insertions(+)

diff --git a/src/Lucene.Net/migration-guide.md 
b/src/Lucene.Net/migration-guide.md
new file mode 100644
index 0000000..af091b4
--- /dev/null
+++ b/src/Lucene.Net/migration-guide.md
@@ -0,0 +1,641 @@
+# Apache Lucene Migration Guide
+
+## Four-dimensional enumerations
+
+Flexible indexing changed the low level fields/terms/docs/positions
+enumeration APIs.  Here are the major changes:
+
+  * Terms are now binary in nature (arbitrary byte[]), represented
+    by the BytesRef class (which provides an offset + length "slice"
+    into an existing byte[]).
+
+  * Fields are separately enumerated (Fields.iterator()) from the terms
+    within each field (TermEnum).  So instead of this:
+
+        TermEnum termsEnum = ...;
+        while(termsEnum.next()) {
+          Term t = termsEnum.term();
+          System.out.println("field=" + t.field() + "; text=" + t.text());
+        }
+
+    Do this:
+
+        for(String field : fields) {
+            Terms terms = fields.terms(field);
+            TermsEnum termsEnum = terms.iterator(null);
+            BytesRef text;
+            while((text = termsEnum.next()) != null) {
+              System.out.println("field=" + field + "; text=" + 
text.utf8ToString());
+          }
+        }
+
+  * TermDocs is renamed to DocsEnum.  Instead of this:
+
+        while(td.next()) {
+          int doc = td.doc();
+          ...
+        }
+
+    do this:
+
+        int doc;
+        while((doc = td.next()) != DocsEnum.NO_MORE_DOCS) {
+          ...
+        }
+
+    Instead of this:
+    
+        if (td.skipTo(target)) {
+          int doc = td.doc();
+          ...
+        }
+
+    do this:
+    
+        if ((doc=td.advance(target)) != DocsEnum.NO_MORE_DOCS) {
+          ...
+        }
+
+  * TermPositions is renamed to DocsAndPositionsEnum, and no longer
+    extends the docs only enumerator (DocsEnum).
+
+  * Deleted docs are no longer implicitly filtered from
+    docs/positions enums.  Instead, you pass a Bits
+    skipDocs (set bits are skipped) when obtaining the enums.  Also,
+    you can now ask a reader for its deleted docs.
+
+  * The docs/positions enums cannot seek to a term.  Instead,
+    TermsEnum is able to seek, and then you request the
+    docs/positions enum from that TermsEnum.
+
+  * TermsEnum's seek method returns more information.  So instead of
+    this:
+
+        Term t;
+        TermEnum termEnum = reader.terms(t);
+        if (t.equals(termEnum.term())) {
+          ...
+        }
+
+    do this:
+
+        TermsEnum termsEnum = ...;
+        BytesRef text;
+        if (termsEnum.seek(text) == TermsEnum.SeekStatus.FOUND) {
+          ...
+        }
+
+    SeekStatus also contains END (enumerator is done) and NOT_FOUND
+    (term was not found but enumerator is now positioned to the next
+    term).
+
+  * TermsEnum has an ord() method, returning the long numeric
+    ordinal (ie, first term is 0, next is 1, and so on) for the term
+    it's not positioned to.  There is also a corresponding seek(long
+    ord) method.  Note that these methods are optional; in
+    particular the MultiFields TermsEnum does not implement them.
+
+
+  * How you obtain the enums has changed.  The primary entry point is
+    the Fields class.  If you know your reader is a single segment
+    reader, do this:
+
+        Fields fields = reader.Fields();
+        if (fields != null) {
+          ...
+        }
+
+    If the reader might be multi-segment, you must do this:
+    
+        Fields fields = MultiFields.getFields(reader);
+        if (fields != null) {
+          ...
+        }
+  
+    The fields may be null (eg if the reader has no fields).
+
+    Note that the MultiFields approach entails a performance hit on
+    MultiReaders, as it must merge terms/docs/positions on the fly. It's
+    generally better to instead get the sequential readers (use
+    oal.util.ReaderUtil) and then step through those readers yourself,
+    if you can (this is how Lucene drives searches).
+
+    If you pass a SegmentReader to MultiFields.fields it will simply
+    return reader.fields(), so there is no performance hit in that
+    case.
+
+    Once you have a non-null Fields you can do this:
+
+        Terms terms = fields.terms("field");
+        if (terms != null) {
+          ...
+        }
+
+    The terms may be null (eg if the field does not exist).
+
+    Once you have a non-null terms you can get an enum like this:
+
+        TermsEnum termsEnum = terms.iterator();
+
+    The returned TermsEnum will not be null.
+
+    You can then .next() through the TermsEnum, or seek.  If you want a
+    DocsEnum, do this:
+
+        Bits liveDocs = reader.getLiveDocs();
+        DocsEnum docsEnum = null;
+
+        docsEnum = termsEnum.docs(liveDocs, docsEnum, needsFreqs);
+
+    You can pass in a prior DocsEnum and it will be reused if possible.
+
+    Likewise for DocsAndPositionsEnum.
+
+    IndexReader has several sugar methods (which just go through the
+    above steps, under the hood).  Instead of:
+
+        Term t;
+        TermDocs termDocs = reader.termDocs();
+        termDocs.seek(t);
+
+    do this:
+
+        Term t;
+        DocsEnum docsEnum = reader.termDocsEnum(t);
+
+    Likewise for DocsAndPositionsEnum.
+
+## LUCENE-2380: FieldCache.getStrings/Index --> FieldCache.getDocTerms/Index
+
+  * The field values returned when sorting by SortField.STRING are now
+    BytesRef.  You can call value.utf8ToString() to convert back to
+    string, if necessary.
+
+  * In FieldCache, getStrings (returning String[]) has been replaced
+    with getTerms (returning a FieldCache.DocTerms instance).
+    DocTerms provides a getTerm method, taking a docID and a BytesRef
+    to fill (which must not be null), and it fills it in with the
+    reference to the bytes for that term.
+
+    If you had code like this before:
+
+        String[] values = FieldCache.DEFAULT.getStrings(reader, field);
+        ...
+        String aValue = values[docID];
+
+    you can do this instead:
+
+        DocTerms values = FieldCache.DEFAULT.getTerms(reader, field);
+        ...
+        BytesRef term = new BytesRef();
+        String aValue = values.getTerm(docID, term).utf8ToString();
+
+    Note however that it can be costly to convert to String, so it's
+    better to work directly with the BytesRef.
+
+  * Similarly, in FieldCache, getStringIndex (returning a StringIndex
+    instance, with direct arrays int[] order and String[] lookup) has
+    been replaced with getTermsIndex (returning a
+    FieldCache.DocTermsIndex instance).  DocTermsIndex provides the
+    getOrd(int docID) method to lookup the int order for a document,
+    lookup(int ord, BytesRef reuse) to lookup the term from a given
+    order, and the sugar method getTerm(int docID, BytesRef reuse)
+    which internally calls getOrd and then lookup.
+
+    If you had code like this before:
+
+        StringIndex idx = FieldCache.DEFAULT.getStringIndex(reader, field);
+        ...
+        int ord = idx.order[docID];
+        String aValue = idx.lookup[ord];
+
+    you can do this instead:
+
+        DocTermsIndex idx = FieldCache.DEFAULT.getTermsIndex(reader, field);
+        ...
+        int ord = idx.getOrd(docID);
+        BytesRef term = new BytesRef();
+        String aValue = idx.lookup(ord, term).utf8ToString();
+
+    Note however that it can be costly to convert to String, so it's
+    better to work directly with the BytesRef.
+
+    DocTermsIndex also has a getTermsEnum() method, which returns an
+    iterator (TermsEnum) over the term values in the index (ie,
+    iterates ord = 0..numOrd()-1).
+
+  * StringComparatorLocale is now more CPU costly than it was before
+    (it was already very CPU costly since it does not compare using
+    indexed collation keys; use CollationKeyFilter for better
+    performance), since it converts BytesRef -> String on the fly.
+    Also, the field values returned when sorting by SortField.STRING
+    are now BytesRef.
+
+  * FieldComparator.StringOrdValComparator has been renamed to
+    TermOrdValComparator, and now uses BytesRef for its values.
+    Likewise for StringValComparator, renamed to TermValComparator.
+    This means when sorting by SortField.STRING or
+    SortField.STRING_VAL (or directly invoking these comparators) the
+    values returned in the FieldDoc.fields array will be BytesRef not
+    String.  You can call the .utf8ToString() method on the BytesRef
+    instances, if necessary.
+
+## LUCENE-2600: IndexReaders are now read-only
+
+  Instead of IndexReader.isDeleted, do this:
+
+      import org.apache.lucene.util.Bits;
+      import org.apache.lucene.index.MultiFields;
+
+      Bits liveDocs = MultiFields.getLiveDocs(indexReader);
+      if (liveDocs != null && !liveDocs.get(docID)) {
+        // document is deleted...
+      }
+    
+## LUCENE-2858, LUCENE-3733: IndexReader --> 
AtomicReader/CompositeReader/DirectoryReader refactoring
+
+The abstract class IndexReader has been 
+refactored to expose only essential methods to access stored fields 
+during display of search results. It is no longer possible to retrieve 
+terms or postings data from the underlying index, not even deletions are 
+visible anymore. You can still pass IndexReader as constructor parameter 
+to IndexSearcher and execute your searches; Lucene will automatically 
+delegate procedures like query rewriting and document collection atomic 
+subreaders. 
+
+If you want to dive deeper into the index and want to write own queries, 
+take a closer look at the new abstract sub-classes AtomicReader and 
+CompositeReader: 
+
+AtomicReader instances are now the only source of Terms, Postings, 
+DocValues and FieldCache. Queries are forced to execute on a Atomic 
+reader on a per-segment basis and FieldCaches are keyed by 
+AtomicReaders. 
+
+Its counterpart CompositeReader exposes a utility method to retrieve 
+its composites. But watch out, composites are not necessarily atomic. 
+Next to the added type-safety we also removed the notion of 
+index-commits and version numbers from the abstract IndexReader, the 
+associations with IndexWriter were pulled into a specialized 
+DirectoryReader. To open Directory-based indexes use 
+DirectoryReader.open(), the corresponding method in IndexReader is now 
+deprecated for easier migration. Only DirectoryReader supports commits, 
+versions, and reopening with openIfChanged(). Terms, postings, 
+docvalues, and norms can from now on only be retrieved using 
+AtomicReader; DirectoryReader and MultiReader extend CompositeReader, 
+only offering stored fields and access to the sub-readers (which may be 
+composite or atomic). 
+
+If you have more advanced code dealing with custom Filters, you might 
+have noticed another new class hierarchy in Lucene (see LUCENE-2831): 
+IndexReaderContext with corresponding Atomic-/CompositeReaderContext. 
+
+The move towards per-segment search Lucene 2.9 exposed lots of custom 
+Queries and Filters that couldn't handle it. For example, some Filter 
+implementations expected the IndexReader passed in is identical to the 
+IndexReader passed to IndexSearcher with all its advantages like 
+absolute document IDs etc. Obviously this "paradigm-shift" broke lots of 
+applications and especially those that utilized cross-segment data 
+structures (like Apache Solr). 
+
+In Lucene 4.0, we introduce IndexReaderContexts "searcher-private" 
+reader hierarchy. During Query or Filter execution Lucene no longer 
+passes raw readers down Queries, Filters or Collectors; instead 
+components are provided an AtomicReaderContext (essentially a hierarchy 
+leaf) holding relative properties like the document-basis in relation to 
+the top-level reader. This allows Queries & Filter to build up logic 
+based on document IDs, albeit the per-segment orientation. 
+
+There are still valid use-cases where top-level readers ie. "atomic 
+views" on the index are desirable. Let say you want to iterate all terms 
+of a complete index for auto-completion or faceting, Lucene provides
+utility wrappers like SlowCompositeReaderWrapper (LUCENE-2597) emulating 
+an AtomicReader. Note: using "atomicity emulators" can cause serious 
+slowdowns due to the need to merge terms, postings, DocValues, and 
+FieldCache, use them with care! 
+
+## LUCENE-4306: getSequentialSubReaders(), ReaderUtil.Gather
+
+The method IndexReader#getSequentialSubReaders() was moved to CompositeReader
+(see LUCENE-2858, LUCENE-3733) and made protected. It is solely used by
+CompositeReader itself to build its reader tree. To get all atomic leaves
+of a reader, use IndexReader#leaves(), which also provides the doc base
+of each leave. Readers that are already atomic return itself as leaf with
+doc base 0. To emulate Lucene 3.x getSequentialSubReaders(),
+use getContext().children().
+
+## LUCENE-2413,LUCENE-3396: Analyzer package changes
+
+Lucene's core and contrib analyzers, along with Solr's analyzers,
+were consolidated into lucene/analysis. During the refactoring some
+package names have changed, and ReusableAnalyzerBase was renamed to
+Analyzer:
+
+  - o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
+  - o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
+  - o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
+  - o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
+  - o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
+  - o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
+  - o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
+  - o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
+  - o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
+  - o.a.l.analysis.WhitespaceTokenizer -> 
o.a.l.analysis.core.WhitespaceTokenizer
+  - o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
+  - o.a.l.analysis.ASCIIFoldingFilter -> 
o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
+  - o.a.l.analysis.ISOLatin1AccentFilter -> 
o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
+  - o.a.l.analysis.KeywordMarkerFilter -> 
o.a.l.analysis.miscellaneous.KeywordMarkerFilter
+  - o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
+  - o.a.l.analysis.PerFieldAnalyzerWrapper -> 
o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
+  - o.a.l.analysis.TeeSinkTokenFilter -> 
o.a.l.analysis.sinks.TeeSinkTokenFilter
+  - o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
+  - o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
+  - o.a.l.analysis.MappingCharFilter -> 
o.a.l.analysis.charfilter.MappingCharFilter
+  - o.a.l.analysis.NormalizeCharMap -> 
o.a.l.analysis.charfilter.NormalizeCharMap
+  - o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
+  - o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
+  - o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.Analyzer
+  - o.a.l.analysis.StopwordAnalyzerBase -> 
o.a.l.analysis.util.StopwordAnalyzerBase
+  - o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
+  - o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
+  - o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
+
+## LUCENE-2514: Collators
+
+The option to use a Collator's order (instead of binary order) for
+sorting and range queries has been moved to lucene/queries.
+The Collated TermRangeQuery/Filter has been moved to 
SlowCollatedTermRangeQuery/Filter, 
+and the collated sorting has been moved to SlowCollatedStringComparator.
+
+Note: this functionality isn't very scalable and if you are using it, consider 
+indexing collation keys with the collation support in the analysis module 
instead.
+
+To perform collated range queries, use a suitable collating analyzer: 
CollationKeyAnalyzer 
+or ICUCollationKeyAnalyzer, and set qp.setAnalyzeRangeTerms(true).
+
+TermRangeQuery and TermRangeFilter now work purely on bytes. Both have helper 
factory methods
+(newStringRange) similar to the NumericRange API, to easily perform range 
queries on Strings.
+
+## LUCENE-2883: ValueSource changes
+
+Lucene's o.a.l.search.function ValueSource based functionality, was 
consolidated
+into lucene/queries along with Solr's similar functionality.  The following 
classes were moved:
+
+ - o.a.l.search.function.CustomScoreQuery -> o.a.l.queries.CustomScoreQuery
+ - o.a.l.search.function.CustomScoreProvider -> 
o.a.l.queries.CustomScoreProvider
+ - o.a.l.search.function.NumericIndexDocValueSource -> 
o.a.l.queries.function.valuesource.NumericIndexDocValueSource
+
+The following lists the replacement classes for those removed:
+
+ - o.a.l.search.function.ByteFieldSource -> 
o.a.l.queries.function.valuesource.ByteFieldSource
+ - o.a.l.search.function.DocValues -> o.a.l.queries.function.DocValues
+ - o.a.l.search.function.FieldCacheSource -> 
o.a.l.queries.function.valuesource.FieldCacheSource
+ - o.a.l.search.function.FieldScoreQuery ->o.a.l.queries.function.FunctionQuery
+ - o.a.l.search.function.FloatFieldSource -> 
o.a.l.queries.function.valuesource.FloatFieldSource
+ - o.a.l.search.function.IntFieldSource -> 
o.a.l.queries.function.valuesource.IntFieldSource
+ - o.a.l.search.function.OrdFieldSource -> 
o.a.l.queries.function.valuesource.OrdFieldSource
+ - o.a.l.search.function.ReverseOrdFieldSource -> 
o.a.l.queries.function.valuesource.ReverseOrdFieldSource
+ - o.a.l.search.function.ShortFieldSource -> 
o.a.l.queries.function.valuesource.ShortFieldSource
+ - o.a.l.search.function.ValueSource -> o.a.l.queries.function.ValueSource
+ - o.a.l.search.function.ValueSourceQuery -> 
o.a.l.queries.function.FunctionQuery
+
+DocValues are now named FunctionValues, to not confuse with Lucene's 
per-document values.
+
+## LUCENE-2392: Enable flexible scoring
+
+The existing "Similarity" api is now TFIDFSimilarity, if you were extending
+Similarity before, you should likely extend this instead.
+
+Weight.normalize no longer takes a norm value that incorporates the top-level
+boost from outer queries such as BooleanQuery, instead it takes 2 parameters,
+the outer boost (topLevelBoost) and the norm. Weight.sumOfSquaredWeights has
+been renamed to Weight.getValueForNormalization().
+
+The scorePayload method now takes a BytesRef. It is never null.
+
+## LUCENE-3283: Query parsers moved to separate module
+
+Lucene's core o.a.l.queryParser QueryParsers have been consolidated into 
lucene/queryparser,
+where other QueryParsers from the codebase will also be placed.  The following 
classes were moved:
+
+  - o.a.l.queryParser.CharStream -> o.a.l.queryparser.classic.CharStream
+  - o.a.l.queryParser.FastCharStream -> 
o.a.l.queryparser.classic.FastCharStream
+  - o.a.l.queryParser.MultiFieldQueryParser -> 
o.a.l.queryparser.classic.MultiFieldQueryParser
+  - o.a.l.queryParser.ParseException -> 
o.a.l.queryparser.classic.ParseException
+  - o.a.l.queryParser.QueryParser -> o.a.l.queryparser.classic.QueryParser
+  - o.a.l.queryParser.QueryParserBase -> 
o.a.l.queryparser.classic.QueryParserBase
+  - o.a.l.queryParser.QueryParserConstants -> 
o.a.l.queryparser.classic.QueryParserConstants
+  - o.a.l.queryParser.QueryParserTokenManager -> 
o.a.l.queryparser.classic.QueryParserTokenManager
+  - o.a.l.queryParser.QueryParserToken -> o.a.l.queryparser.classic.Token
+  - o.a.l.queryParser.QueryParserTokenMgrError -> 
o.a.l.queryparser.classic.TokenMgrError
+
+## LUCENE-2308, LUCENE-3453: Separate IndexableFieldType from Field instances
+
+With this change, the indexing details (indexed, tokenized, norms,
+indexOptions, stored, etc.) are moved into a separate FieldType
+instance (rather than being stored directly on the Field).
+
+This means you can create the FieldType instance once, up front,
+for a given field, and then re-use that instance whenever you instantiate
+the Field.
+
+Certain field types are pre-defined since they are common cases:
+
+  * StringField: indexes a String value as a single token (ie, does
+    not tokenize).  This field turns off norms and indexes only doc
+    IDS (does not index term frequency nor positions).  This field
+    does not store its value, but exposes TYPE_STORED as well.
+  * TextField: indexes and tokenizes a String, Reader or TokenStream
+    value, without term vectors.  This field does not store its value,
+    but exposes TYPE_STORED as well.
+  * StoredField: field that stores its value
+  * DocValuesField: indexes the value as a DocValues field
+  * NumericField: indexes the numeric value so that NumericRangeQuery
+    can be used at search-time.
+
+If your usage fits one of those common cases you can simply
+instantiate the above class.  If you need to store the value, you can
+add a separate StoredField to the document, or you can use
+TYPE_STORED for the field:
+
+    Field f = new Field("field", "value", StringField.TYPE_STORED);
+
+Alternatively, if an existing type is close to what you want but you
+need to make a few changes, you can copy that type and make changes:
+
+    FieldType bodyType = new FieldType(TextField.TYPE_STORED);
+    bodyType.setStoreTermVectors(true);
+
+You can of course also create your own FieldType from scratch:
+
+    FieldType t = new FieldType();
+    t.setIndexed(true);
+    t.setStored(true);
+    t.setOmitNorms(true);
+    t.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
+    t.freeze();
+
+FieldType has a freeze() method to prevent further changes.
+
+There is also a deprecated transition API, providing the same Index,
+Store, TermVector enums from 3.x, and Field constructors taking these
+enums.
+
+When migrating from the 3.x API, if you did this before:
+
+    new Field("field", "value", Field.Store.NO, 
Field.Indexed.NOT_ANALYZED_NO_NORMS)
+
+you can now do this:
+
+    new StringField("field", "value")
+
+(though note that StringField indexes DOCS_ONLY).
+
+If instead the value was stored:
+
+    new Field("field", "value", Field.Store.YES, 
Field.Indexed.NOT_ANALYZED_NO_NORMS)
+
+you can now do this:
+
+    new Field("field", "value", StringField.TYPE_STORED)
+
+If you didn't omit norms:
+
+    new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED)
+
+you can now do this:
+
+    FieldType ft = new FieldType(StringField.TYPE_STORED);
+    ft.setOmitNorms(false);
+    new Field("field", "value", ft)
+
+If you did this before (value can be String or Reader):
+
+    new Field("field", value, Field.Store.NO, Field.Indexed.ANALYZED)
+
+you can now do this:
+
+    new TextField("field", value, Field.Store.NO)
+
+If instead the value was stored:
+
+    new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED)
+
+you can now do this:
+
+    new TextField("field", value, Field.Store.YES)
+
+If in addition you omit norms:
+
+    new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED_NO_NORMS)
+
+you can now do this:
+
+    FieldType ft = new FieldType(TextField.TYPE_STORED);
+    ft.setOmitNorms(true);
+    new Field("field", value, ft)
+
+If you did this before (bytes is a byte[]):
+
+    new Field("field", bytes)
+
+you can now do this:
+
+    new StoredField("field", bytes)
+
+If you previously used Document.setBoost, you must now pre-multiply
+the document boost into each Field.setBoost.  If you have a
+multi-valued field, you should do this only for the first Field
+instance (ie, subsequent Field instance sharing the same field name
+should only include their per-field boost and not the document level
+boost) as the boost for multi-valued field instances are multiplied
+together by Lucene.
+
+## Other changes
+
+* LUCENE-2674:
+  A new idfExplain method was added to Similarity, that
+  accepts an incoming docFreq.  If you subclass Similarity, make sure
+  you also override this method on upgrade, otherwise your
+  customizations won't run for certain MultiTermQuerys.
+
+* LUCENE-2691: The near-real-time API has moved from IndexWriter to
+  DirectoryReader.  Instead of IndexWriter.getReader(), call
+  DirectoryReader.open(IndexWriter) or 
DirectoryReader.openIfChanged(IndexWriter).
+
+* LUCENE-2690: MultiTermQuery boolean rewrites per segment.
+  Also MultiTermQuery.getTermsEnum() now takes an AttributeSource. 
FuzzyTermsEnum
+  is both consumer and producer of attributes: MTQ.BoostAttribute is
+  added to the FuzzyTermsEnum and MTQ's rewrite mode consumes it.
+  The other way round MTQ.TopTermsBooleanQueryRewrite supplies a
+  global AttributeSource to each segments TermsEnum. The TermsEnum is consumer
+  and gets the current minimum competitive boosts 
(MTQ.MaxNonCompetitiveBoostAttribute).
+
+* LUCENE-2374: The backwards layer in AttributeImpl was removed. To support 
correct
+  reflection of AttributeImpl instances, where the reflection was done using 
deprecated
+  toString() parsing, you have to now override reflectWith() to customize 
output.
+  toString() is no longer implemented by AttributeImpl, so if you have 
overridden
+  toString(), port your customization over to reflectWith(). reflectAsString() 
would
+  then return what toString() did before.
+
+* LUCENE-2236, LUCENE-2912: DefaultSimilarity can no longer be set statically 
+  (and dangerously) for the entire JVM.
+  Similarity can now be configured on a per-field basis (via 
PerFieldSimilarityWrapper)
+  Similarity has a lower-level API, if you want the higher-level vector-space 
API
+  like in previous Lucene releases, then look at TFIDFSimilarity.
+
+* LUCENE-1076: TieredMergePolicy is now the default merge policy.
+  It's able to merge non-contiguous segments; this may cause problems
+  for applications that rely on Lucene's internal document ID
+  assignment.  If so, you should instead use LogByteSize/DocMergePolicy
+  during indexing.
+
+* LUCENE-3722: Similarity methods and collection/term statistics now take
+  long instead of int (to enable distributed scoring of > 2B docs). 
+  For example, in TFIDFSimilarity idf(int, int) is now idf(long, long). 
+
+* LUCENE-3559: The methods "docFreq" and "maxDoc" on IndexSearcher were 
removed,
+  as these are no longer used by the scoring system.
+  If you were using these casually in your code for reasons unrelated to 
scoring,
+  call them on the IndexSearcher's reader instead: getIndexReader().
+  If you were subclassing IndexSearcher and overriding these methods to alter
+  scoring, override IndexSearcher's termStatistics() and collectionStatistics()
+  methods instead.
+
+* LUCENE-3396: Analyzer.tokenStream() and .reusableTokenStream() have been 
made final.
+  It is now necessary to use Analyzer.TokenStreamComponents to define an 
analysis process.
+  Analyzer also has its own way of managing the reuse of TokenStreamComponents 
(either
+  globally, or per-field).  To define another Strategy, implement 
Analyzer.ReuseStrategy.
+
+* LUCENE-3464: IndexReader.reopen has been renamed to
+  DirectoryReader.openIfChanged (a static method), and now returns null
+  (instead of the old reader) if there are no changes to the index, to
+  prevent the common pitfall of accidentally closing the old reader.
+  
+* LUCENE-3687: Similarity#computeNorm() now expects a Norm object to set the 
computed 
+  norm value instead of returning a fixed single byte value. Custom 
similarities can now
+  set integer, float and byte values if a single byte is not sufficient.
+
+* LUCENE-2621: Term vectors are now accessed via flexible indexing API.
+  If you used IndexReader.getTermFreqVector/s before, you should now
+  use IndexReader.getTermVectors.  The new method returns a Fields
+  instance exposing the inverted index of the one document.  From
+  Fields you can enumerate all fields, terms, positions, offsets.
+
+* LUCENE-4227: If you were previously using Instantiated index, you
+  may want to use DirectPostingsFormat after upgrading: it stores all
+  postings in simple arrrays (byte[] for terms, int[] for docs, freqs,
+  positions, offsets).  Note that this only covers postings, whereas
+  Instantiated covered all other parts of the index as well.
+
+* LUCENE-3309: The expert FieldSelector API has been replaced with
+  StoredFieldVisitor.  The idea is the same (you have full control
+  over which fields should be loaded).  Instead of a single accept
+  method, StoredFieldVisitor has a needsField method: if that method
+  returns true then the field will be loaded and the appropriate
+  type-specific method will be invoked with that fields's value.
+
+* LUCENE-4122: Removed the Payload class and replaced with BytesRef.
+  PayloadAttribute's name is unchanged, it just uses the BytesRef
+  class to refer to the payload bytes/start offset/end offset 
+  (or null if there is no payload).

[lucenenet] 01/05: Lucene.Net: Addeed migration-guide.md (originally MIGRATE.txt). This is the original text from Lucene 4.8.1 without changes.

Reply via email to