This is an automated email from the ASF dual-hosted git repository. nightowl888 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/lucenenet.git
commit 8caf7b647ec37bfd2fe8e10bb7263bfdf3a2c6a4 Author: Shad Storhaug <[email protected]> AuthorDate: Tue Mar 30 19:36:21 2021 +0700 docs: migration-guide.md: Fixed formatting so code examples are inside of lists and lists continue after the code --- src/Lucene.Net/migration-guide.md | 460 +++++++++++++++++--------------------- 1 file changed, 200 insertions(+), 260 deletions(-) diff --git a/src/Lucene.Net/migration-guide.md b/src/Lucene.Net/migration-guide.md index 35aa0b9..bdf1201 100644 --- a/src/Lucene.Net/migration-guide.md +++ b/src/Lucene.Net/migration-guide.md @@ -66,69 +66,59 @@ enumeration APIs. Here are the major changes: * Fields are separately enumerated (`Fields.GetEnumerator()`) from the terms within each field (`TermEnum`). So instead of this: - -```cs - TermEnum termsEnum = ...; - while (termsEnum.Next()) - { - Term t = termsEnum.Term; - Console.WriteLine("field=" + t.Field + "; text=" + t.Text); - } -``` - - Do this: - -```cs - foreach (string field in fields) + ```cs + TermEnum termsEnum = ...; + while (termsEnum.Next()) + { + Term t = termsEnum.Term; + Console.WriteLine("field=" + t.Field + "; text=" + t.Text); + } + ``` + Do this: + ```cs + foreach (string field in fields) + { + Terms terms = fields.GetTerms(field); + TermsEnum termsEnum = terms.GetEnumerator(); + BytesRef text; + while(termsEnum.MoveNext()) { - Terms terms = fields.GetTerms(field); - TermsEnum termsEnum = terms.GetEnumerator(); - BytesRef text; - while(termsEnum.MoveNext()) - { - Console.WriteLine("field=" + field + "; text=" + termsEnum.Current.Utf8ToString()); - } + Console.WriteLine("field=" + field + "; text=" + termsEnum.Current.Utf8ToString()); } -``` + } + ``` * `TermDocs` is renamed to `DocsEnum`. Instead of this: - -```cs - while (td.Next()) - { - int doc = td.Doc; - ... - } -``` - - do this: - -```cs - int doc; - while ((doc = td.Next()) != DocsEnum.NO_MORE_DOCS) - { - ... - } -``` - - Instead of this: - -```cs - if (td.SkipTo(target)) - { - int doc = td.Doc; - ... - } -``` - - do this: - -```cs - if ((doc = td.Advance(target)) != DocsEnum.NO_MORE_DOCS) - { - ... - } -``` + ```cs + while (td.Next()) + { + int doc = td.Doc; + ... + } + ``` + do this: + ```cs + int doc; + while ((doc = td.Next()) != DocsEnum.NO_MORE_DOCS) + { + ... + } + ``` + Instead of this: + ```cs + if (td.SkipTo(target)) + { + int doc = td.Doc; + ... + } + ``` + do this: + ```cs + if ((doc = td.Advance(target)) != DocsEnum.NO_MORE_DOCS) + { + ... + } + ``` * `TermPositions` is renamed to `DocsAndPositionsEnum`, and no longer extends the docs only enumerator (`DocsEnum`). @@ -142,32 +132,25 @@ enumeration APIs. Here are the major changes: `TermsEnum` is able to seek, and then you request the docs/positions enum from that `TermsEnum`. -* `TermsEnum`'s seek method returns more information. So instead of - this: - -```cs - Term t; - TermEnum termEnum = reader.Terms(t); - if (t.Equals(termEnum.Term)) - { - ... - } -``` - - do this: - -```cs - TermsEnum termsEnum = ...; - BytesRef text; - if (termsEnum.Seek(text) == TermsEnum.SeekStatus.FOUND) - { - ... - } -``` - - `SeekStatus` also contains `END` (enumerator is done) and `NOT_FOUND` - (term was not found but enumerator is now positioned to the next - term). +* `TermsEnum`'s seek method returns more information. So instead of this: + ```cs + Term t; + TermEnum termEnum = reader.Terms(t); + if (t.Equals(termEnum.Term)) + { + ... + } + ``` + do this: + ```cs + TermsEnum termsEnum = ...; + BytesRef text; + if (termsEnum.Seek(text) == TermsEnum.SeekStatus.FOUND) + { + ... + } + ``` + `SeekStatus` also contains `END` (enumerator is done) and `NOT_FOUND` (term was not found but enumerator is now positioned to the next term). * `TermsEnum` has an `Ord` property, returning the long numeric ordinal (ie, first term is 0, next is 1, and so on) for the term @@ -175,92 +158,62 @@ enumeration APIs. Here are the major changes: ord) method. Note that these members are optional; in particular the `MultiFields` `TermsEnum` does not implement them. - * How you obtain the enums has changed. The primary entry point is the `Fields` class. If you know your reader is a single segment reader, do this: - -```cs - Fields fields = reader.Fields(); - if (fields != null) - { - ... - } -``` - - If the reader might be multi-segment, you must do this: - -```cs - Fields fields = MultiFields.GetFields(reader); - if (fields != null) - { - ... - } -``` - - The fields may be `null` (eg if the reader has no fields). - - Note that the `MultiFields` approach entails a performance hit on - `MultiReaders`, as it must merge terms/docs/positions on the fly. It's - generally better to instead get the sequential readers (use - `Lucene.Net.Util.ReaderUtil`) and then step through those readers yourself, - if you can (this is how Lucene drives searches). - - If you pass a `SegmentReader` to `MultiFields.GetFields()` it will simply - return `reader.GetFields(), so there is no performance hit in that - case. - - Once you have a non-null `Fields` you can do this: - -```cs - Terms terms = fields.GetTerms("field"); - if (terms != null) - { - ... - } -``` - - The terms may be `null` (eg if the field does not exist). - - Once you have a non-null terms you can get an enum like this: - -```cs - TermsEnum termsEnum = terms.GetIterator(); -``` - - The returned `TermsEnum` will not be `null`. - - You can then .Next() through the TermsEnum, or Seek. If you want a - `DocsEnum`, do this: - -```cs - IBits liveDocs = reader.GetLiveDocs(); - DocsEnum docsEnum = null; - - docsEnum = termsEnum.Docs(liveDocs, docsEnum, needsFreqs); -``` - - You can pass in a prior `DocsEnum` and it will be reused if possible. - - Likewise for `DocsAndPositionsEnum`. - - `IndexReader` has several sugar methods (which just go through the - above steps, under the hood). Instead of: - -```cs - Term t; - TermDocs termDocs = reader.TermDocs; - termDocs.Seek(t); -``` - - do this: - -```cs - Term t; - DocsEnum docsEnum = reader.GetTermDocsEnum(t); -``` - - Likewise for `DocsAndPositionsEnum`. + ```cs + Fields fields = reader.Fields(); + if (fields != null) + { + ... + } + ``` + If the reader might be multi-segment, you must do this: + ```cs + Fields fields = MultiFields.GetFields(reader); + if (fields != null) + { + ... + } + ``` + The fields may be `null` (eg if the reader has no fields).<br/> + Note that the `MultiFields` approach entails a performance hit on `MultiReaders`, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use `Lucene.Net.Util.ReaderUtil`) and then step through those readers yourself, if you can (this is how Lucene drives searches).<br/> + If you pass a `SegmentReader` to `MultiFields.GetFields()` it will simply return `reader.GetFields()`, so there is no performance hit in that case.<br/> + Once you have a non-null `Fields` you can do this: + ```cs + Terms terms = fields.GetTerms("field"); + if (terms != null) + { + ... + } + ``` + The terms may be `null` (eg if the field does not exist).<br/> + Once you have a non-null terms you can get an enum like this: + ```cs + TermsEnum termsEnum = terms.GetIterator(); + ``` + The returned `TermsEnum` will not be `null`.<br/> + You can then .Next() through the TermsEnum, or Seek. If you want a `DocsEnum`, do this: + ```cs + IBits liveDocs = reader.GetLiveDocs(); + DocsEnum docsEnum = null; + + docsEnum = termsEnum.Docs(liveDocs, docsEnum, needsFreqs); + ``` + You can pass in a prior `DocsEnum` and it will be reused if possible.<br/> + Likewise for `DocsAndPositionsEnum`.<br/> + `IndexReader` has several sugar methods (which just go through the above steps, under the hood). Instead of: + ```cs + Term t; + TermDocs termDocs = reader.TermDocs; + termDocs.Seek(t); + ``` + do this: + ```cs + Term t; + DocsEnum docsEnum = reader.GetTermDocsEnum(t); + ``` + Likewise for `DocsAndPositionsEnum`. ## [LUCENE-2380](https://issues.apache.org/jira/browse/LUCENE-2380): FieldCache.GetStrings/Index --> FieldCache.GetDocTerms/Index @@ -272,28 +225,22 @@ enumeration APIs. Here are the major changes: with `GetTerms` (returning a `BinaryDocValues` instance). `BinaryDocValues` provides a `Get` method, taking a `docID` and a `BytesRef` to fill (which must not be `null`), and it fills it in with the - reference to the bytes for that term. - - If you had code like this before: - -```cs - string[] values = FieldCache.DEFAULT.GetStrings(reader, field); - ... - string aValue = values[docID]; -``` - - you can do this instead: - -```cs - BinaryDocValues values = FieldCache.DEFAULT.GetTerms(reader, field); - ... - BytesRef term = new BytesRef(); - values.Get(docID, term); - string aValue = term.Utf8ToString(); -``` - - Note however that it can be costly to convert to `String`, so it's - better to work directly with the `BytesRef`. + reference to the bytes for that term.<br/> + If you had code like this before: + ```cs + string[] values = FieldCache.DEFAULT.GetStrings(reader, field); + ... + string aValue = values[docID]; + ``` + you can do this instead: + ```cs + BinaryDocValues values = FieldCache.DEFAULT.GetTerms(reader, field); + ... + BytesRef term = new BytesRef(); + values.Get(docID, term); + string aValue = term.Utf8ToString(); + ``` + Note however that it can be costly to convert to `String`, so it's better to work directly with the `BytesRef`. * Similarly, in `FieldCache`, GetStringIndex (returning a `StringIndex` instance, with direct arrays `int[]` order and `String[]` lookup) has @@ -302,34 +249,25 @@ enumeration APIs. Here are the major changes: `GetOrd(int docID)` method to lookup the int order for a document, `LookupOrd(int ord, BytesRef result)` to lookup the term from a given order, and the sugar method `Get(int docID, BytesRef result)` - which internally calls `GetOrd` and then `LookupOrd`. - - If you had code like this before: - -```cs - StringIndex idx = FieldCache.DEFAULT.GetStringIndex(reader, field); - ... - int ord = idx.order[docID]; - String aValue = idx.lookup[ord]; -``` - - you can do this instead: - -```cs - DocTermsIndex idx = FieldCache.DEFAULT.GetTermsIndex(reader, field); - ... - int ord = idx.GetOrd(docID); - BytesRef term = new BytesRef(); - idx.LookupOrd(ord, term); - String aValue = term.Utf8ToString(); -``` - - Note however that it can be costly to convert to `String`, so it's - better to work directly with the `BytesRef`. - - `DocTermsIndex` also has a `GetTermsEnum()` method, which returns an - iterator (`TermsEnum`) over the term values in the index (ie, - iterates ord = 0..NumOrd-1). + which internally calls `GetOrd` and then `LookupOrd`.<br/> + If you had code like this before: + ```cs + StringIndex idx = FieldCache.DEFAULT.GetStringIndex(reader, field); + ... + int ord = idx.order[docID]; + String aValue = idx.lookup[ord]; + ``` + you can do this instead: + ```cs + DocTermsIndex idx = FieldCache.DEFAULT.GetTermsIndex(reader, field); + ... + int ord = idx.GetOrd(docID); + BytesRef term = new BytesRef(); + idx.LookupOrd(ord, term); + string aValue = term.Utf8ToString(); + ``` + Note however that it can be costly to convert to `String`, so it's better to work directly with the `BytesRef`.<br/> + `DocTermsIndex` also has a `GetTermsEnum()` method, which returns an iterator (`TermsEnum`) over the term values in the index (ie, iterates ord = 0..NumOrd-1). * `FieldComparator.StringComparatorLocale` has been removed. (it was very CPU costly since it does not compare using @@ -347,17 +285,17 @@ enumeration APIs. Here are the major changes: ## [LUCENE-2600](https://issues.apache.org/jira/browse/LUCENE-2600): `IndexReader`s are now read-only - Instead of `IndexReader.IsDeleted(int n)`, do this: +Instead of `IndexReader.IsDeleted(int n)`, do this: ```cs - using Lucene.Net.Util; - using Lucene.Net.Index; - - IBits liveDocs = MultiFields.GetLiveDocs(indexReader); - if (liveDocs != null && !liveDocs.Get(docID)) - { - // document is deleted... - } +using Lucene.Net.Util; +using Lucene.Net.Index; + +IBits liveDocs = MultiFields.GetLiveDocs(indexReader); +if (liveDocs != null && !liveDocs.Get(docID)) +{ + // document is deleted... +} ``` ## [LUCENE-2858](https://issues.apache.org/jira/browse/LUCENE-2858), [LUCENE-3733](https://issues.apache.org/jira/browse/LUCENE-3733): `IndexReader` --> `AtomicReader`/`CompositeReader`/`DirectoryReader` refactoring @@ -561,28 +499,30 @@ add a separate `StoredField` to the document, or you can use `TYPE_STORED` for the field: ```cs - Field f = new Field("field", "value", StringField.TYPE_STORED); +Field f = new Field("field", "value", StringField.TYPE_STORED); ``` Alternatively, if an existing type is close to what you want but you need to make a few changes, you can copy that type and make changes: ```cs - FieldType bodyType = new FieldType(TextField.TYPE_STORED); - bodyType.setStoreTermVectors(true); +FieldType bodyType = new FieldType(TextField.TYPE_STORED) +{ + StoreTermVectors = true +}; ``` You can of course also create your own `FieldType` from scratch: ```cs - FieldType t = new FieldType - { - Indexed = true, - Stored = true, - OmitNorms = true, - IndexOptions = IndexOptions.DOCS_AND_FREQS - }; - t.Freeze(); +FieldType t = new FieldType +{ + Indexed = true, + Stored = true, + OmitNorms = true, + IndexOptions = IndexOptions.DOCS_AND_FREQS +}; +t.Freeze(); ``` `FieldType` has a `Freeze()` method to prevent further changes. @@ -594,13 +534,13 @@ enums. When migrating from the 3.x API, if you did this before: ```cs - new Field("field", "value", Field.Store.NO, Field.Indexed.NOT_ANALYZED_NO_NORMS) +new Field("field", "value", Field.Store.NO, Field.Indexed.NOT_ANALYZED_NO_NORMS) ``` you can now do this: ```cs - new StringField("field", "value") +new StringField("field", "value") ``` (though note that `StringField` indexes `DOCS_ONLY`). @@ -608,81 +548,81 @@ you can now do this: If instead the value was stored: ```cs - new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED_NO_NORMS) +new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED_NO_NORMS) ``` you can now do this: ```cs - new Field("field", "value", TextField.TYPE_STORED) +new Field("field", "value", TextField.TYPE_STORED) ``` If you didn't omit norms: ```cs - new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED) +new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED) ``` you can now do this: ```cs - FieldType ft = new FieldType(TextField.TYPE_STORED) - { - OmitNorms = false - }; - new Field("field", "value", ft) +FieldType ft = new FieldType(TextField.TYPE_STORED) +{ + OmitNorms = false +}; +new Field("field", "value", ft) ``` If you did this before (value can be `String` or `TextReader`): ```cs - new Field("field", value, Field.Store.NO, Field.Indexed.ANALYZED) +new Field("field", value, Field.Store.NO, Field.Indexed.ANALYZED) ``` you can now do this: ```cs - new TextField("field", value, Field.Store.NO) +new TextField("field", value, Field.Store.NO) ``` If instead the value was stored: ```cs - new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED) +new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED) ``` you can now do this: ```cs - new TextField("field", value, Field.Store.YES) +new TextField("field", value, Field.Store.YES) ``` If in addition you omit norms: ```cs - new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED_NO_NORMS) +new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED_NO_NORMS) ``` you can now do this: ```cs - FieldType ft = new FieldType(TextField.TYPE_STORED) - { - OmitNorms = true - }; - new Field("field", value, ft) +FieldType ft = new FieldType(TextField.TYPE_STORED) +{ + OmitNorms = true +}; +new Field("field", value, ft) ``` If you did this before (bytes is a `byte[]`): ```cs - new Field("field", bytes) +new Field("field", bytes) ``` you can now do this: ```cs - new StoredField("field", bytes) +new StoredField("field", bytes) ``` If you previously used the setter of `Document.Boost`, you must now pre-multiply
