[lucenenet] 04/06: docs: Lucene.Net.ICU: Restructured documentation and fixed broken formatting and links (see #284, #300)

nightowl888 Sun, 28 Mar 2021 10:52:12 -0700

This is an automated email from the ASF dual-hosted git repository.

nightowl888 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/lucenenet.git


commit c8a0f00f0c92c0c55d3bb1ed0b6f26133774eb5e
Author: Shad Storhaug <[email protected]>
AuthorDate: Sun Mar 28 20:36:57 2021 +0700

    docs: Lucene.Net.ICU: Restructured documentation and fixed broken 
formatting and links (see #284, #300)
---
 .../Collation/TokenAttributes/package.md           |   4 +-
 src/Lucene.Net.Analysis.ICU/overview.md            | 265 ++++++++++++---------
 src/dotnet/Lucene.Net.ICU/Lucene.Net.ICU.csproj    |  18 +-
 src/dotnet/Lucene.Net.ICU/overview.md              |  66 +++++
 websites/apidocs/docfx.icu.json                    |  12 +-
 websites/apidocs/docfx.json                        |   1 +
 websites/apidocs/index.md                          |   4 +-
 7 files changed, 239 insertions(+), 131 deletions(-)

diff --git a/src/Lucene.Net.Analysis.ICU/Collation/TokenAttributes/package.md 
b/src/Lucene.Net.Analysis.ICU/Collation/TokenAttributes/package.md
index 4c6ef88..e429c8b 100644
--- a/src/Lucene.Net.Analysis.ICU/Collation/TokenAttributes/package.md
+++ b/src/Lucene.Net.Analysis.ICU/Collation/TokenAttributes/package.md
@@ -1,4 +1,4 @@
----
+---
 uid: Lucene.Net.Collation.TokenAttributes
 summary: *content
 ---
@@ -20,4 +20,4 @@ summary: *content
  limitations under the License.
 -->
 
-Custom <xref:Lucene.Net.Util.AttributeImpl> for indexing collation keys as 
index terms.
\ No newline at end of file
+Custom <xref:Lucene.Net.Util.Attribute> for indexing collation keys as index 
terms.
\ No newline at end of file
diff --git a/src/Lucene.Net.Analysis.ICU/overview.md 
b/src/Lucene.Net.Analysis.ICU/overview.md
index 95609f8..59d49d6 100644
--- a/src/Lucene.Net.Analysis.ICU/overview.md
+++ b/src/Lucene.Net.Analysis.ICU/overview.md
@@ -1,4 +1,4 @@
----
+---
 uid: Lucene.Net.Analysis.Icu
 summary: *content
 ---
@@ -22,16 +22,26 @@ summary: *content
 <!-- :Post-Release-Update-Version.LUCENE_XY: - several mentions in this file 
-->
 
 This module exposes functionality from 
-[ICU](http://site.icu-project.org/) to Apache Lucene. ICU4J is a Java
-library that enhances Java's internationalization support by improving 
+[ICU](http://site.icu-project.org/) to Apache Lucene. ICU4N is a .NET
+library that enhances .NET's internationalization support by improving 
 performance, keeping current with the Unicode Standard, and providing richer
-APIs. 
+APIs.
+
+> [!NOTE]
+> The <xref:Lucene.Net.Analysis.Icu> namespace was ported from Lucene 7.1.0 to 
get a more up-to-date version of Unicode than what shipped with Lucene 4.8.0.
+
+> [!NOTE]
+> Since the .NET platform doesn't provide a BreakIterator class (or similar), 
the functionality that utilizes it was consolidated from Java Lucene's 
analyzers-icu package, <xref:Lucene.Net.Analysis.Common> and 
<xref:Lucene.Net.Highlighter> into this unified package.
+> [!WARNING]
+> While ICU4N's BreakIterator has customizable rules, its default behavior is 
not the same as the one in the JDK. When using any features of this package 
outside of the <xref:Lucene.Net.Analysis.Icu> namespace, they will behave 
differently than they do in Java Lucene and the rules may need some tweaking to 
fit your needs. See the [Break 
Rules](http://userguide.icu-project.org/boundaryanalysis/break-rules) ICU 
documentation for details on how to customize 
`ICU4N.Text.RuleBaseBreakIterator`.
+
+
 
 For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis> package documentation.
 
  This module exposes the following functionality: 
 
-*   [Text Segmentation](#segmentation): Tokenizes text based on 
+*   [Text Segmentation](#text-segmentation): Tokenizes text based on 
   properties and rules defined in Unicode.
 
 *   [Collation](#collation): Compare strings according to the 
@@ -40,18 +50,18 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 *   [Normalization](#normalization): Converts text to a unique,
   equivalent form.
 
-*   [Case Folding](#casefolding): Removes case distinctions with
+*   [Case Folding](#case-folding): Removes case distinctions with
   Unicode's Default Caseless Matching algorithm.
 
-*   [Search Term Folding](#searchfolding): Removes distinctions
+*   [Search Term Folding](#search-term-folding): Removes distinctions
   (such as accent marks) between similar characters for a loose or fuzzy 
search.
 
-*   [Text Transformation](#transform): Transforms Unicode text in
+*   [Text Transformation](#text-transform): Transforms Unicode text in
   a context-sensitive fashion: e.g. mapping Traditional to Simplified Chinese
 
 * * *
 
-# [Text Segmentation]()
+# Text Segmentation
 
  Text Segmentation (Tokenization) divides document and query text into index 
terms (typically words). Unicode provides special properties and rules so that 
this can be done in a manner that works well with most languages. 
 
@@ -66,26 +76,26 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 
 ### Tokenizing multilanguage text
 
-      /**
-       * This tokenizer will work well in general for most languages.
-       */
-      Tokenizer tokenizer = new ICUTokenizer(reader);
+```cs
+// This tokenizer will work well in general for most languages.
+Tokenizer tokenizer = new ICUTokenizer(reader);
+```
 
 * * *
 
-# [Collation]()
+# Collation
 
- `ICUCollationKeyAnalyzer` converts each token into its binary `CollationKey` 
using the provided `Collator`, allowing it to be stored as an index term. 
+ <xref:Lucene.Net.Collation.ICUCollationKeyAnalyzer> converts each token into 
its binary `CollationKey` using the provided `Collator`, allowing it to be 
stored as an index term. 
 
- `ICUCollationKeyAnalyzer` depends on ICU4J to produce the `CollationKey`s. 
+ <xref:Lucene.Net.Collation.ICUCollationKeyAnalyzer> depends on ICU4N to 
produce the `CollationKey`s. 
 
 ## Use Cases
 
 *   Efficient sorting of terms in languages that use non-Unicode character 
-    orderings.  (Lucene Sort using a Locale can be very slow.) 
+    orderings.  (Lucene Sort using a CultureInfo can be very slow.) 
 
 *   Efficient range queries over fields that contain terms in languages that 
-    use non-Unicode character orderings.  (Range queries using a Locale can be
+    use non-Unicode character orderings.  (Range queries using a CultureInfo 
can be
     very slow.)
 
 *   Effective Locale-specific normalization (case differences, diacritics, 
etc.).
@@ -97,80 +107,99 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 
 ### Farsi Range Queries
 
-      Collator collator = Collator.getInstance(new ULocale("ar"));
-      ICUCollationKeyAnalyzer analyzer = new 
ICUCollationKeyAnalyzer(Version.LUCENE_48, collator);
-      RAMDirectory ramDir = new RAMDirectory();
-      IndexWriter writer = new IndexWriter(ramDir, new 
IndexWriterConfig(Version.LUCENE_48, analyzer));
-      Document doc = new Document();
-      doc.add(new Field("content", "\u0633\u0627\u0628", 
-                        Field.Store.YES, Field.Index.ANALYZED));
-      writer.addDocument(doc);
-      writer.close();
-      IndexSearcher is = new IndexSearcher(ramDir, true);
-    
-  QueryParser aqp = new QueryParser(Version.LUCENE_48, "content", analyzer);
-      aqp.setAnalyzeRangeTerms(true);
-
-      // Unicode order would include U+0633 in [ U+062F - U+0698 ], but Farsi
-      // orders the U+0698 character before the U+0633 character, so the single
-      // indexed Term above should NOT be returned by a ConstantScoreRangeQuery
-      // with a Farsi Collator (or an Arabic one for the case when Farsi is not
-      // supported).
-      ScoreDoc[] result
-        = is.search(aqp.parse("[ \u062F TO \u0698 ]"), null, 1000).scoreDocs;
-      assertEquals("The index Term should not be included.", 0, result.length);
+```cs
+const LuceneVersion matchVersion = LuceneVersion.LUCENE_48;
+Collator collator = Collator.GetInstance(new UCultureInfo("ar"));
+ICUCollationKeyAnalyzer analyzer = new ICUCollationKeyAnalyzer(matchVersion, 
collator);
+RAMDirectory ramDir = new RAMDirectory();
+using IndexWriter writer = new IndexWriter(ramDir, new 
IndexWriterConfig(matchVersion, analyzer));
+writer.AddDocument(new Document {
+    new TextField("content", "\u0633\u0627\u0628", Field.Store.YES)
+});
+using IndexReader reader = writer.GetReader(applyAllDeletes: true);
+writer.Dispose();
+IndexSearcher searcher = new IndexSearcher(reader);
+
+QueryParser queryParser = new QueryParser(matchVersion, "content", analyzer)
+{
+    AnalyzeRangeTerms = true
+};
+
+// Unicode order would include U+0633 in [ U+062F - U+0698 ], but Farsi
+// orders the U+0698 character before the U+0633 character, so the single
+// indexed Term above should NOT be returned by a ConstantScoreRangeQuery
+// with a Farsi Collator (or an Arabic one for the case when Farsi is not
+// supported).
+ScoreDoc[] result = searcher.Search(queryParser.Parse("[ \u062F TO \u0698 ]"), 
null, 1000).ScoreDocs;
+Assert.AreEqual(0, result.Length, "The index Term should not be included.");
+```
 
 ### Danish Sorting
 
-      Analyzer analyzer 
-        = new ICUCollationKeyAnalyzer(Version.LUCENE_48, 
Collator.getInstance(new ULocale("da", "dk")));
-      RAMDirectory indexStore = new RAMDirectory();
-      IndexWriter writer = new IndexWriter(indexStore, new 
IndexWriterConfig(Version.LUCENE_48, analyzer));
-      String[] tracer = new String[] { "A", "B", "C", "D", "E" };
-      String[] data = new String[] { "HAT", "HUT", "H\u00C5T", "H\u00D8T", 
"HOT" };
-      String[] sortedTracerOrder = new String[] { "A", "E", "B", "D", "C" };
-      for (int i = 0 ; i < data.length="" ;="" ++i)="" {="" document="" 
doc="new" document();="" doc.add(new="" field("tracer",="" tracer[i],="" 
field.store.yes,="" field.index.no));="" doc.add(new="" field("contents",="" 
data[i],="" field.store.no,="" field.index.analyzed));="" 
writer.adddocument(doc);="" }="" writer.close();="" indexsearcher="" 
searcher="new" indexsearcher(indexstore,="" true);="" sort="" sort="new" 
sort();="" sort.setsort(new="" sortfield("contents",="" sortfield.stri [...]
+```cs
+const LuceneVersion matchVersion = LuceneVersion.LUCENE_48;
+Analyzer analyzer = new ICUCollationKeyAnalyzer(matchVersion, 
Collator.GetInstance(new UCultureInfo("da-dk")));
+string indexPath = Path.Combine(Path.GetTempPath(), 
Path.GetFileNameWithoutExtension(Path.GetTempFileName()));
+Directory dir = FSDirectory.Open(indexPath);
+using IndexWriter writer = new IndexWriter(dir, new 
IndexWriterConfig(matchVersion, analyzer));
+string[] tracer = new string[] { "A", "B", "C", "D", "E" };
+string[] data = new string[] { "HAT", "HUT", "H\u00C5T", "H\u00D8T", "HOT" };
+string[] sortedTracerOrder = new string[] { "A", "E", "B", "D", "C" };
+for (int i = 0; i < data.Length; ++i)
+{
+    writer.AddDocument(new Document
+    {
+        new StringField("tracer", tracer[i], Field.Store.YES),
+        new TextField("contents", data[i], Field.Store.NO)
+    });
+}
+using IndexReader reader = writer.GetReader(applyAllDeletes: true);
+writer.Dispose();
+IndexSearcher searcher = new IndexSearcher(reader);
+Sort sort = new Sort();
+sort.SetSort(new SortField("contents",  SortFieldType.STRING));
+Query query = new MatchAllDocsQuery();
+ScoreDoc[] result = searcher.Search(query, null, 1000, sort).ScoreDocs;
+for (int i = 0; i < result.Length; ++i)
+{
+    Document doc = searcher.Doc(result[i].Doc);
+    Assert.AreEqual(sortedTracerOrder[i], doc.GetValues("tracer")[0]);
+}
+```
 
 ### Turkish Case Normalization
 
-      Collator collator = Collator.getInstance(new ULocale("tr", "TR"));
-      collator.setStrength(Collator.PRIMARY);
-      Analyzer analyzer = new ICUCollationKeyAnalyzer(Version.LUCENE_48, 
collator);
-      RAMDirectory ramDir = new RAMDirectory();
-      IndexWriter writer = new IndexWriter(ramDir, new 
IndexWriterConfig(Version.LUCENE_48, analyzer));
-      Document doc = new Document();
-      doc.add(new Field("contents", "DIGY", Field.Store.NO, 
Field.Index.ANALYZED));
-      writer.addDocument(doc);
-      writer.close();
-      IndexSearcher is = new IndexSearcher(ramDir, true);
-      QueryParser parser = new QueryParser(Version.LUCENE_48, "contents", 
analyzer);
-      Query query = parser.parse("d\u0131gy");   // U+0131: dotless i
-      ScoreDoc[] result = is.search(query, null, 1000).scoreDocs;
-      assertEquals("The index Term should be included.", 1, result.length);
+```cs
+const LuceneVersion matchVersion = LuceneVersion.LUCENE_48;
+Collator collator = Collator.GetInstance(new UCultureInfo("tr-TR"));
+collator.Strength = CollationStrength.Primary;
+Analyzer analyzer = new ICUCollationKeyAnalyzer(matchVersion, collator);
+string indexPath = Path.Combine(Path.GetTempPath(), 
Path.GetFileNameWithoutExtension(Path.GetTempFileName()));
+Directory dir = FSDirectory.Open(indexPath);
+using IndexWriter writer = new IndexWriter(dir, new 
IndexWriterConfig(matchVersion, analyzer));
+writer.AddDocument(new Document {
+    new TextField("contents", "DIGY", Field.Store.NO)
+});
+using IndexReader reader = writer.GetReader(applyAllDeletes: true);
+writer.Dispose();
+IndexSearcher searcher = new IndexSearcher(reader);
+QueryParser parser = new QueryParser(matchVersion, "contents", analyzer);
+Query query = parser.Parse("d\u0131gy");   // U+0131: dotless i
+ScoreDoc[] result = searcher.Search(query, null, 1000).ScoreDocs;
+Assert.AreEqual(1, result.Length, "The index Term should be included.");
+```
 
 ## Caveats and Comparisons
 
- __WARNING:__ Make sure you use exactly the same `Collator` at index and query 
time -- `CollationKey`s are only comparable when produced by the same 
`Collator`. Since {@link java.text.RuleBasedCollator}s are not independently 
versioned, it is unsafe to search against stored `CollationKey`s unless the 
following are exactly the same (best practice is to store this information with 
the index and check that they remain the same at query time): 
-
-1.  JVM vendor
-
-2.  JVM version, including patch version
-
-3.  The language (and country and variant, if specified) of the Locale
-    used when constructing the collator via
-    {@link java.text.Collator#getInstance(java.util.Locale)}.
-
-4.  The collation strength used - see {@link 
java.text.Collator#setStrength(int)}
-
- `ICUCollationKeyAnalyzer` uses ICU4J's `Collator`, which makes its version 
available, thus allowing collation to be versioned independently from the JVM. 
`ICUCollationKeyAnalyzer` is also significantly faster and generates 
significantly shorter keys than `CollationKeyAnalyzer`. See 
[http://site.icu-project.org/charts/collation-icu4j-sun](http://site.icu-project.org/charts/collation-icu4j-sun)
 for key generation timing and key length comparisons between ICU4J and 
`java.text.Collator` ove [...]
+ `ICUCollationKeyAnalyzer` uses ICU4N's `Collator`, which makes its version 
available, thus allowing collation to be versioned independently from the .NET 
target framework. `ICUCollationKeyAnalyzer` is also fast. 
 
- `CollationKey`s generated by `java.text.Collator`s are not compatible with 
those those generated by ICU Collators. Specifically, if you use 
`CollationKeyAnalyzer` to generate index terms, do not use 
`ICUCollationKeyAnalyzer` on the query side, or vice versa. 
+ `SortKey`s generated by `CompareInfo`s are not compatible with those those 
generated by ICU Collators. Specifically, if you use `CollationKeyAnalyzer` to 
generate index terms, do not use `ICUCollationKeyAnalyzer` on the query side, 
or vice versa. 
 
 * * *
 
-# [Normalization]()
+# Normalization
 
- `ICUNormalizer2Filter` normalizes term text to a [Unicode Normalization 
Form](http://unicode.org/reports/tr15/), so that 
[equivalent](http://en.wikipedia.org/wiki/Unicode_equivalence) forms are 
standardized to a unique form. 
+ <xref:Lucene.Net.Analysis.Icu.ICUNormalizer2Filter> normalizes term text to a 
[Unicode Normalization Form](http://unicode.org/reports/tr15/), so that 
[equivalent](http://en.wikipedia.org/wiki/Unicode_equivalence) forms are 
standardized to a unique form. 
 
 ## Use Cases
 
@@ -183,18 +212,16 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 
 ### Normalizing text to NFC
 
-      /**
-       * Normalizer2 objects are unmodifiable and immutable.
-       */
-      Normalizer2 normalizer = Normalizer2.getInstance(null, "nfc", 
Normalizer2.Mode.COMPOSE);
-      /**
-       * This filter will normalize to NFC.
-       */
-      TokenStream tokenstream = new ICUNormalizer2Filter(tokenizer, 
normalizer);
+```cs
+// Normalizer2 objects are unmodifiable and immutable.
+Normalizer2 normalizer = Normalizer2.GetInstance(null, "nfc", 
Normalizer2Mode.Compose);
+// This filter will normalize to NFC.
+TokenStream tokenstream = new ICUNormalizer2Filter(tokenizer, normalizer);
+```
 
 * * *
 
-# [Case Folding]()
+# Case Folding
 
  Default caseless matching, or case-folding is more than just conversion to 
lowercase. For example, it handles cases such as the Greek sigma, so that 
"Μάϊος" and "ΜΆΪΟΣ" will match correctly. 
 
@@ -211,14 +238,14 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 
 ### Lowercasing text
 
-      /**
-       * This filter will case-fold and normalize to NFKC.
-       */
-      TokenStream tokenstream = new ICUNormalizer2Filter(tokenizer);
+```cs
+// This filter will case-fold and normalize to NFKC.
+TokenStream tokenstream = new ICUNormalizer2Filter(tokenizer);
+```
 
 * * *
 
-# [Search Term Folding]()
+# Search Term Folding
 
  Search term folding removes distinctions (such as accent marks) between 
similar characters. It is useful for a fuzzy or loose search. 
 
@@ -233,15 +260,15 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 
 ### Removing accents
 
-      /**
-       * This filter will case-fold, remove accents and other distinctions, and
-       * normalize to NFKC.
-       */
-      TokenStream tokenstream = new ICUFoldingFilter(tokenizer);
+```cs
+// This filter will case-fold, remove accents and other distinctions, and
+// normalize to NFKC.
+TokenStream tokenstream = new ICUFoldingFilter(tokenizer);
+```
 
 * * *
 
-# [Text Transformation]()
+# Text Transformation
 
  ICU provides text-transformation functionality via its Transliteration API. 
This allows you to transform text in a variety of ways, taking context into 
account. 
 
@@ -257,36 +284,36 @@ For an introduction to Lucene's analysis API, see the 
<xref:Lucene.Net.Analysis>
 
 ### Convert Traditional to Simplified
 
-      /**
-       * This filter will map Traditional Chinese to Simplified Chinese
-       */
-      TokenStream tokenstream = new ICUTransformFilter(tokenizer, 
Transliterator.getInstance("Traditional-Simplified"));
+```cs
+// This filter will map Traditional Chinese to Simplified Chinese
+TokenStream tokenstream = new ICUTransformFilter(tokenizer, 
Transliterator.GetInstance("Traditional-Simplified"));
+```
 
 ### Transliterate Serbian Cyrillic to Serbian Latin
 
-      /**
-       * This filter will map Serbian Cyrillic to Serbian Latin according to 
BGN rules
-       */
-      TokenStream tokenstream = new ICUTransformFilter(tokenizer, 
Transliterator.getInstance("Serbian-Latin/BGN"));
+```cs
+// This filter will map Serbian Cyrillic to Serbian Latin according to BGN 
rules
+TokenStream tokenstream = new ICUTransformFilter(tokenizer, 
Transliterator.GetInstance("Serbian-Latin/BGN"));
+```
 
 * * *
 
-# [Backwards Compatibility]()
+# Backwards Compatibility
 
- This module exists to provide up-to-date Unicode functionality that supports 
the most recent version of Unicode (currently 6.3). However, some users who 
wish for stronger backwards compatibility can restrict 
<xref:Lucene.Net.Analysis.Icu.ICUNormalizer2Filter> to operate on only a 
specific Unicode Version by using a {@link 
com.ibm.icu.text.FilteredNormalizer2}. 
+ This module exists to provide up-to-date Unicode functionality that supports 
the most recent version of Unicode (currently 8.0). However, some users who 
wish for stronger backwards compatibility can restrict 
<xref:Lucene.Net.Analysis.Icu.ICUNormalizer2Filter> to operate on only a 
specific Unicode Version by using a FilteredNormalizer2. 
 
 ## Example Usages
 
 ### Restricting normalization to Unicode 5.0
 
-      /**
-       * This filter will do NFC normalization, but will ignore any characters 
that
-       * did not exist as of Unicode 5.0. Because of the normalization 
stability policy
-       * of Unicode, this is an easy way to force normalization to a specific 
version.
-       */
-        Normalizer2 normalizer = Normalizer2.getInstance(null, "nfc", 
Normalizer2.Mode.COMPOSE);
-        UnicodeSet set = new UnicodeSet("[:age=5.0:]");
-        // see FilteredNormalizer2 docs, the set should be frozen or 
performance will suffer
-        set.freeze(); 
-        FilteredNormalizer2 unicode50 = new FilteredNormalizer2(normalizer, 
set);
-        TokenStream tokenstream = new ICUNormalizer2Filter(tokenizer, 
unicode50);
\ No newline at end of file
+```cs
+// This filter will do NFC normalization, but will ignore any characters that
+// did not exist as of Unicode 5.0. Because of the normalization stability 
policy
+// of Unicode, this is an easy way to force normalization to a specific 
version.
+Normalizer2 normalizer = Normalizer2.GetInstance(null, "nfc", 
Normalizer2Mode.Compose);
+UnicodeSet set = new UnicodeSet("[:age=5.0:]");
+// see FilteredNormalizer2 docs, the set should be frozen or performance will 
suffer
+set.Freeze();
+FilteredNormalizer2 unicode50 = new FilteredNormalizer2(normalizer, set);
+TokenStream tokenstream = new ICUNormalizer2Filter(tokenizer, unicode50);
+```
\ No newline at end of file
diff --git a/src/dotnet/Lucene.Net.ICU/Lucene.Net.ICU.csproj 
b/src/dotnet/Lucene.Net.ICU/Lucene.Net.ICU.csproj
index 050614a..0c8bc3b 100644
--- a/src/dotnet/Lucene.Net.ICU/Lucene.Net.ICU.csproj
+++ b/src/dotnet/Lucene.Net.ICU/Lucene.Net.ICU.csproj
@@ -37,18 +37,26 @@
 
   <ItemGroup>
     <Compile Include="..\..\Lucene.Net.Analysis.Common\Analysis\Th\**\*.cs" 
LinkBase="Analysis\Th" />
-    <EmbeddedResource 
Include="..\..\Lucene.Net.Analysis.Common\Analysis\Th\stopwords.txt" 
Link="Analysis\Th\stopwords.txt" />
-    <EmbeddedResource Include="Support\*.brk" />
     <Compile 
Include="..\..\Lucene.Net.Analysis.Common\Analysis\Util\CharArrayIterator.cs" 
Link="Analysis\Util\CharArrayIterator.cs" />
     <Compile 
Include="..\..\Lucene.Net.Analysis.Common\Analysis\Util\SegmentingTokenizerBase.cs"
 Link="Analysis\Util\SegmentingTokenizerBase.cs" />
     <Compile Include="..\..\Lucene.Net.Analysis.ICU\Analysis\**\*.cs" 
LinkBase="Analysis" />
-    <EmbeddedResource 
Include="..\..\Lucene.Net.Analysis.ICU\Analysis\**\*.nrm" LinkBase="Analysis" />
-    <EmbeddedResource 
Include="..\..\Lucene.Net.Analysis.ICU\Analysis\**\*.brk" LinkBase="Analysis" />
     <Compile Include="..\..\Lucene.Net.Analysis.ICU\Collation\**\*.cs" 
LinkBase="Collation" />
     <Compile Include="..\..\Lucene.Net.Highlighter\PostingsHighlight\**\*.cs" 
LinkBase="Search\PostingsHighlight" />
+    <Compile 
Include="..\..\Lucene.Net.Highlighter\VectorHighlight\BreakIteratorBoundaryScanner.cs"
 Link="Search\VectorHighlight\BreakIteratorBoundaryScanner.cs" />
+  </ItemGroup>
+
+  <ItemGroup Label="Embedded Resources">
+    <EmbeddedResource 
Include="..\..\Lucene.Net.Analysis.Common\Analysis\Th\stopwords.txt" 
Link="Analysis\Th\stopwords.txt" />
+    <EmbeddedResource Include="Support\*.brk" />
+    <EmbeddedResource 
Include="..\..\Lucene.Net.Analysis.ICU\Analysis\**\*.nrm" LinkBase="Analysis" />
+    <EmbeddedResource 
Include="..\..\Lucene.Net.Analysis.ICU\Analysis\**\*.brk" LinkBase="Analysis" />
     <EmbeddedResource 
Include="..\..\Lucene.Net.Highlighter\PostingsHighlight\**\*.brk" 
LinkBase="Search\PostingsHighlight" />
     <None Remove="Support\*.brk" />
-    <Compile 
Include="..\..\Lucene.Net.Highlighter\VectorHighlight\BreakIteratorBoundaryScanner.cs"
 Link="Search\VectorHighlight\BreakIteratorBoundaryScanner.cs" />
+  </ItemGroup>
+
+  <ItemGroup Label="Documentation">
+    <None Include="..\..\Lucene.Net.Analysis.Common\Analysis\Th\**\*.md" 
LinkBase="Analysis\Th" />
+    <None Include="..\..\Lucene.Net.Highlighter\PostingsHighlight\**\*.md" 
LinkBase="Search\PostingsHighlight" />
   </ItemGroup>
 
   <ItemGroup>
diff --git a/src/dotnet/Lucene.Net.ICU/overview.md 
b/src/dotnet/Lucene.Net.ICU/overview.md
new file mode 100644
index 0000000..fd982db
--- /dev/null
+++ b/src/dotnet/Lucene.Net.ICU/overview.md
@@ -0,0 +1,66 @@
+---
+uid: Lucene.Net.ICU
+title: Lucene.Net.ICU
+summary: *content
+---
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+This module exposes functionality from 
+[ICU](http://site.icu-project.org/) to Apache Lucene. ICU4N is a .NET
+library that enhances .NET's internationalization support by improving 
+performance, keeping current with the Unicode Standard, and providing richer
+APIs.
+
+> [!NOTE]
+> Since the .NET platform doesn't provide a BreakIterator class (or similar), 
the functionality that utilizes it was consolidated from Java Lucene's 
analyzers-icu package, <xref:Lucene.Net.Analysis.Common> and 
<xref:Lucene.Net.Highlighter> into this unified package.
+> [!WARNING]
+> While ICU4N's BreakIterator has customizable rules, its default behavior is 
not the same as the one in the JDK. When using any features of this package 
outside of the <xref:Lucene.Net.Analysis.Icu> namespace, they will behave 
differently than they do in Java Lucene and the rules may need some tweaking to 
fit your needs. See the [Break 
Rules](http://userguide.icu-project.org/boundaryanalysis/break-rules) ICU 
documentation for details on how to customize 
`ICU4N.Text.RuleBaseBreakIterator`.
+
+
+
+This module exposes the following functionality: 
+
+* [Text Analysis](xref:Lucene.Net.Analysis.Icu): For an introduction to 
Lucene's analysis API, see the <xref:Lucene.Net.Analysis> package documentation.
+
+  * [Text Segmentation](xref:Lucene.Net.Analysis.Icu#text-segmentation): 
Tokenizes text based on 
+  properties and rules defined in Unicode.
+
+  * [Collation](xref:Lucene.Net.Analysis.Icu#collation): Compare strings 
according to the 
+  conventions and standards of a particular language, region or country.
+
+  * [Normalization](xref:Lucene.Net.Analysis.Icu#normalization): Converts text 
to a unique,
+  equivalent form.
+
+  * [Case Folding](xref:Lucene.Net.Analysis.Icu#case-folding): Removes case 
distinctions with
+  Unicode's Default Caseless Matching algorithm.
+
+  * [Search Term Folding](xref:Lucene.Net.Analysis.Icu#search-term-folding): 
Removes distinctions
+  (such as accent marks) between similar characters for a loose or fuzzy 
search.
+
+  * [Text Transformation](xref:Lucene.Net.Analysis.Icu#text-transformation): 
Transforms Unicode text in
+  a context-sensitive fashion: e.g. mapping Traditional to Simplified Chinese
+
+  * [Thai Language Analysis](xref:Lucene.Net.Analysis.Th)
+
+* Unicode Highlighter Support
+
+  * [Postings Highlighter](xref:Lucene.Net.Search.PostingsHighlight): 
Highlighter implementation that uses offsets from postings lists.
+
+  * [Vector 
Highlighter](xref:Lucene.Net.Search.VectorHighlight.BreakIteratorBoundaryScanner):
 An implementation of IBoundaryScanner for use with the vector highlighter in 
the [Lucene.Net.Highlighter 
module](../highlighter/Lucene.Net.Search.Highlight.html).
+
diff --git a/websites/apidocs/docfx.icu.json b/websites/apidocs/docfx.icu.json
index 89c4691..2049a53 100644
--- a/websites/apidocs/docfx.icu.json
+++ b/websites/apidocs/docfx.icu.json
@@ -1,4 +1,4 @@
-{
+{
   "metadata": [
     {
       "src": [
@@ -20,7 +20,13 @@
     }
   ],
   "build": {
-    "content": [       
+    "content": [
+      {
+        "files": [
+          "overview.md"
+        ],
+        "src": "../../src/dotnet/Lucene.Net.ICU"
+      },
       {
         "files": [
           "**.yml",
@@ -36,7 +42,7 @@
         "src": "toc"
       }
     ],    
-    "overwrite": [   
+    "overwrite": [
       {
         "files": [
           "**/package.md",
diff --git a/websites/apidocs/docfx.json b/websites/apidocs/docfx.json
index d6dca5c..3a2bad0 100644
--- a/websites/apidocs/docfx.json
+++ b/websites/apidocs/docfx.json
@@ -523,6 +523,7 @@
           "Lucene.Net.Analysis.Morfologik/overview.md",
           "Lucene.Net.Analysis.OpenNLP/overview.md",
           "Lucene.Net.Highlighter/overview.md",
+          "Lucene.Net.ICU/overview.md",
           "Lucene.Net.Grouping/package.md",
           "Lucene.Net.QueryParser/overview.md",
           "Lucene.Net.Sandbox/overview.md",
diff --git a/websites/apidocs/index.md b/websites/apidocs/index.md
index 777265e..d57aefb 100644
--- a/websites/apidocs/index.md
+++ b/websites/apidocs/index.md
@@ -49,12 +49,12 @@ on some of the conceptual or inner details of Lucene:
 - <xref:Lucene.Net.Analysis.Stempel> - Analyzer for indexing Polish
 - [Lucene.Net.Benchmark](xref:Lucene.Net.Benchmarks) - System for benchmarking 
Lucene
 - <xref:Lucene.Net.Classification> - Classification module for Lucene
-- <xref:Lucene.Net.Codecs> - Lucene codecs and postings formats
+- [Lucene.Net.Codecs](api/codecs/overview.html) - Lucene codecs and postings 
formats
 - [Lucene.Net.Expressions](xref:Lucene.Net.Expressions) - Dynamically computed 
values to sort/facet/search on based on a pluggable grammar
 - [Lucene.Net.Facet](xref:Lucene.Net.Facet) - Faceted indexing and search 
capabilities
 - <xref:Lucene.Net.Grouping> - Collectors for grouping search results
 - <xref:Lucene.Net.Search.Highlight> - Highlights search keywords in results
-- <xref:Lucene.Net.Analysis.Icu> - Specialized ICU (International Components 
for Unicode) Analyzers and Highlighters
+- <xref:Lucene.Net.ICU> - Specialized ICU (International Components for 
Unicode) Analyzers and Highlighters
 - <xref:Lucene.Net.Join> - Index-time and Query-time joins for normalized 
content
 - [Lucene.Net.Memory](xref:Lucene.Net.Index.Memory) - Single-document 
in-memory index implementation
 - <xref:Lucene.Net.Misc> - Index tools and other miscellaneous code

[lucenenet] 04/06: docs: Lucene.Net.ICU: Restructured documentation and fixed broken formatting and links (see #284, #300)

Reply via email to