This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 1927aa2  update for lucene 10.1.0 plus add some colors
1927aa2 is described below

commit 1927aa2f21de6be6dc6c2e485ed37e769a782d73
Author: Paul King <[email protected]>
AuthorDate: Sun Dec 22 07:57:49 2024 +1000

    update for lucene 10.1.0 plus add some colors
---
 site/src/site/blog/groovy-lucene.adoc | 204 +++++++++++++++++-----------------
 1 file changed, 101 insertions(+), 103 deletions(-)

diff --git a/site/src/site/blog/groovy-lucene.adoc 
b/site/src/site/blog/groovy-lucene.adoc
index 65cb60f..0b8077d 100644
--- a/site/src/site/blog/groovy-lucene.adoc
+++ b/site/src/site/blog/groovy-lucene.adoc
@@ -72,11 +72,26 @@ are wanting to follow along and run these examples:
 ----
 String baseDir = '/projects/groovy-website/site/src/site/blog' // <1>
 ----
-<1> You'd need to check out the Groovy website and point to it here
+<1> You'd need to check out the Groovy website and point `baseDir` to it here
 
-Now our script will traverse all the files in that directory, processing them 
with our regex
-and track the hits we find.
+First, let's create a little helper method for printing a pretty
+graph of our results (we'll use the `colorize` method from 
https://github.com/dialex/JColor[JColor]):
 
+[source,groovy]
+----
+def display(Map<String, Integer> data, int max, int scale = 1) {
+    data.each { k, v ->
+        var label = "$k ($v)"
+        var color = k.startsWith('apache') ? MAGENTA_TEXT() : BLUE_TEXT()
+        println "${label.padRight(32)} ${colorize(bar(v * scale, 0, max, max), 
color)}"
+    }
+}
+----
+
+Now our script will traverse all the files in that directory,
+processing them with our regex and track the hits we find.
+
+// Matcher.groovy
 [source,groovy]
 ----
 var histogram = [:].withDefault { 0 } // <1>
@@ -92,10 +107,7 @@ new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file 
->  // <2>
 }
 
 println "\nFrequency of total hits mentioning a project:"
-histogram.sort { e -> -e.value }.each { k, v -> // <8>
-    var label = "$k ($v)"
-    println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
-}
+display(histogram.sort { e -> -e.value }, 50) // <8>
 ----
 <1> This is a map which provides a default value for non-existent keys
 <2> This traverses the directory processing each AsciiDoc file
@@ -124,7 +136,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons 
cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
 groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
-groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:2, apache&nbsp;commons:1, apache&nbsp;commons math:2]
+groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:3, apache&nbsp;commons:4, apache&nbsp;commons math:2, 
apache&nbsp;spark:1]
 groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
@@ -141,33 +153,33 @@ wordle-checker.adoc: [eclipse&nbsp;collections:3]
 zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
 
 Frequency of total hits mentioning a project:
-eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
-apache&nbsp;commons math (18)         ██████████████████▏
-apache&nbsp;ignite (17)               █████████████████▏
-apache&nbsp;spark (13)                █████████████▏
-apache&nbsp;mxnet (12)                ████████████▏
-apache&nbsp;wayang (11)               ███████████▏
-apache&nbsp;age (11)                  ███████████▏
-eclipse&nbsp;deeplearning4j (8)       ████████▏
-apache&nbsp;commons collections (7)   ███████▏
-apache&nbsp;commons csv (6)           ██████▏
-apache&nbsp;nlpcraft (5)              █████▏
-apache&nbsp;pekko (4)                 ████▏
-apache&nbsp;hugegraph (3)             ███▏
-apache&nbsp;tinkerpop (3)             ███▏
-apache&nbsp;flink (2)                 ██▏
-apache&nbsp;commons cli (2)           ██▏
-apache&nbsp;lucene (2)                ██▏
-apache&nbsp;commons (2)               ██▏
-apache&nbsp;opennlp (2)               ██▏
-apache&nbsp;ofbiz (1)                 █▏
-apache&nbsp;beam (1)                  █▏
-apache&nbsp;commons numbers (1)       █▏
-apache&nbsp;nutch (1)                 █▏
-apache&nbsp;solr (1)                  █▏
-apache&nbsp;commons codec (1)         █▏
-apache&nbsp;commons io (1)            █▏
-apache&nbsp;kie (1)                   █▏
+eclipse&nbsp;collections (50)         <span 
style="color:blue">██████████████████████████████████████████████████</span>▏
+apache&nbsp;commons math (18)         <span 
style="color:purple">██████████████████</span>▏
+apache&nbsp;ignite (17)               <span 
style="color:purple">█████████████████</span>▏
+apache&nbsp;spark (14)                <span 
style="color:purple">██████████████</span>▏
+apache&nbsp;mxnet (12)                <span 
style="color:purple">████████████</span>▏
+apache&nbsp;wayang (11)               <span 
style="color:purple">███████████</span>▏
+apache&nbsp;age (11)                  <span 
style="color:purple">███████████</span>▏
+eclipse&nbsp;deeplearning4j (8)       <span style="color:blue">████████</span>▏
+apache&nbsp;commons collections (7)   <span 
style="color:purple">███████</span>▏
+apache&nbsp;commons csv (6)           <span style="color:purple">██████</span>▏
+apache&nbsp;nlpcraft (5)              <span style="color:purple">█████</span>▏
+apache&nbsp;pekko (4)                 <span style="color:purple">████</span>▏
+apache&nbsp;hugegraph (3)             <span style="color:purple">███</span>▏
+apache&nbsp;tinkerpop (3)             <span style="color:purple">███</span>▏
+apache&nbsp;lucene (3)                <span style="color:purple">███</span>▏
+apache&nbsp;flink (2)                 <span style="color:purple">██</span>▏
+apache&nbsp;commons cli (2)           <span style="color:purple">██</span>▏
+apache&nbsp;commons (2)               <span style="color:purple">██</span>▏
+apache&nbsp;opennlp (2)               <span style="color:purple">██</span>▏
+apache&nbsp;ofbiz (1)                 <span style="color:purple">█</span>▏
+apache&nbsp;beam (1)                  <span style="color:purple">█</span>▏
+apache&nbsp;commons numbers (1)       <span style="color:purple">█</span>▏
+apache&nbsp;nutch (1)                 <span style="color:purple">█</span>▏
+apache&nbsp;solr (1)                  <span style="color:purple">█</span>▏
+apache&nbsp;commons codec (1)         <span style="color:purple">█</span>▏
+apache&nbsp;commons io (1)            <span style="color:purple">█</span>▏
+apache&nbsp;kie (1)                   <span style="color:purple">█</span>▏
 </pre>
 ++++
 
@@ -205,6 +217,7 @@ class ProjectNameAnalyzer extends Analyzer {
 
 Let's now tokenize our documents and let Lucene index them.
 
+// LuceneWithRegexAnalyzer.groovy
 [source,groovy]
 ----
 var analyzer = new ProjectNameAnalyzer() // <1>
@@ -268,19 +281,13 @@ println "\nFrequency of total hits mentioning a project 
(top 10):"
 var termFreq = terms.collectEntries { term ->
     [term.text(), reader.totalTermFreq(term)]  // <3>
 }
-termFreq.sort(byReverseValue).take(10).each { k, v ->
-    var label = "$k ($v)"
-    println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
-}
+display(termFreq.sort(byReverseValue).take(10), 50)
 
 println "\nFrequency of documents mentioning a project (top 10):"
 var docFreq = terms.collectEntries { term ->
     [term.text(), reader.docFreq(term)]  // <4>
 }
-docFreq.sort(byReverseValue).take(10).each { k, v ->
-    var label = "$k ($v)"
-    println "${label.padRight(32)} ${bar(v * 2, 0, 20, 20)}"
-}
+display(docFreq.sort(byReverseValue).take(10), 20, 2)
 ----
 <1> Get all index terms
 <2> Look for terms which match project names, so we can save them to a set
@@ -305,7 +312,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons 
cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
 groovy-list-processing-cheat-sheet.adoc: [apache&nbsp;commons collections:3, 
eclipse&nbsp;collections:4]
-groovy-lucene.adoc: [apache&nbsp;commons:1, apache&nbsp;commons math:2, 
apache&nbsp;lucene:2, apache&nbsp;nutch:1, apache&nbsp;solr:1]
+groovy-lucene.adoc: [apache&nbsp;commons:4, apache&nbsp;commons math:2, 
apache&nbsp;lucene:3, apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;spark:1]
 groovy-null-processing.adoc: [apache&nbsp;commons collections:4, 
eclipse&nbsp;collections:6]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
@@ -322,28 +329,28 @@ wordle-checker.adoc: [eclipse&nbsp;collections:3]
 zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
 
 Frequency of total hits mentioning a project (top 10):
-eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
-apache&nbsp;commons math (17)         █████████████████▏
-apache&nbsp;ignite (17)               █████████████████▏
-apache&nbsp;spark (13)                █████████████▏
-apache&nbsp;mxnet (12)                ████████████▏
-apache&nbsp;wayang (11)               ███████████▏
-apache&nbsp;age (11)                  ███████████▏
-eclipse&nbsp;deeplearning4j (8)       ████████▏
-apache&nbsp;commons collections (7)   ███████▏
-apache&nbsp;commons csv (6)           ██████▏
+eclipse&nbsp;collections (50)         <span 
style="color:blue">██████████████████████████████████████████████████</span>▏
+apache&nbsp;commons math (17)         <span 
style="color:purple">█████████████████</span>▏
+apache&nbsp;ignite (17)               <span 
style="color:purple">█████████████████</span>▏
+apache&nbsp;spark (14)                <span 
style="color:purple">██████████████</span>▏
+apache&nbsp;mxnet (12)                <span 
style="color:purple">████████████</span>▏
+apache&nbsp;wayang (11)               <span 
style="color:purple">███████████</span>▏
+apache&nbsp;age (11)                  <span 
style="color:purple">███████████</span>▏
+eclipse&nbsp;deeplearning4j (8)       <span style="color:blue">████████</span>▏
+apache&nbsp;commons collections (7)   <span 
style="color:purple">███████</span>▏
+apache&nbsp;commons csv (6)           <span style="color:purple">██████</span>▏
 
 Frequency of documents mentioning a project (top 10):
-eclipse&nbsp;collections (10)         ████████████████████▏
-apache&nbsp;commons math (7)          ██████████████▏
-apache&nbsp;spark (5)                 ██████████▏
-apache&nbsp;ignite (4)                ████████▏
-apache&nbsp;commons csv (4)           ████████▏
-eclipse&nbsp;deeplearning4j (3)       ██████▏
-apache&nbsp;wayang (3)                ██████▏
-apache&nbsp;flink (2)                 ████▏
-apache&nbsp;commons collections (2)   ████▏
-apache&nbsp;commons (2)               ████▏
+eclipse&nbsp;collections (10)         <span 
style="color:blue">████████████████████</span>▏
+apache&nbsp;commons math (7)          <span 
style="color:purple">██████████████</span>▏
+apache&nbsp;spark (6)                 <span 
style="color:purple">██████████</span>▏
+apache&nbsp;ignite (4)                <span 
style="color:purple">████████</span>▏
+apache&nbsp;commons csv (4)           <span 
style="color:purple">████████</span>▏
+eclipse&nbsp;deeplearning4j (3)       <span style="color:blue">██████</span>▏
+apache&nbsp;wayang (3)                <span style="color:purple">██████</span>▏
+apache&nbsp;flink (2)                 <span style="color:purple">████</span>▏
+apache&nbsp;commons collections (2)   <span style="color:purple">████</span>▏
+apache&nbsp;commons (2)               <span style="color:purple">████</span>▏
 
 </pre>
 ++++
@@ -396,6 +403,7 @@ pick out the terms of interest, project names that match 
our query.
 For the highlight functionality to work, we ask the indexer to store some 
additional information
 when indexing, in particular term positions and offsets. The index code 
changes to look like this:
 
+// Lucene.groovy
 [source,groovy]
 ----
 new IndexWriter(indexDir, config).withCloseable { writer ->
@@ -451,10 +459,7 @@ results.scoreDocs.each { ScoreDoc scoreDoc -> // <3>
 }
 
 println "\nFrequency of total hits mentioning a project (top 10):"
-histogram.sort { e -> -e.value }.take(10).each { k, v -> // <6>
-    var label = "$k ($v)"
-    println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
-}
+display(histogram.sort { e -> -e.value }.take(10), 50) // <6>
 ----
 <1> Search for terms with the apache or eclipse prefixes
 <2> Perform our query with a limit of 30 results
@@ -485,7 +490,7 @@ fun-with-obfuscated-groovy.adoc: [apache&nbsp;commons 
math:1]
 groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
-groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:2, apache&nbsp;commons:1, apache&nbsp;commons math:2]
+groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:3, apache&nbsp;commons:4, apache&nbsp;commons math:2, 
apache&nbsp;spark:1]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
 handling-byte-order-mark-characters.adoc: [apache&nbsp;commons io:1]
@@ -500,16 +505,16 @@ wordle-checker.adoc: [eclipse&nbsp;collections:3]
 zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
 
 Frequency of total hits mentioning a project (top 10):
-eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
-apache&nbsp;commons math (18)         ██████████████████▏
-apache&nbsp;ignite (17)               █████████████████▏
-apache&nbsp;spark (13)                █████████████▏
-apache&nbsp;mxnet (12)                ████████████▏
-apache&nbsp;wayang (11)               ███████████▏
-apache&nbsp;age (11)                  ███████████▏
-eclipse&nbsp;deeplearning4j (8)       ████████▏
-apache&nbsp;commons collections (7)   ███████▏
-apache&nbsp;commons csv (6)           ██████▏
+eclipse&nbsp;collections (50)         <span 
style="color:blue">██████████████████████████████████████████████████</span>▏
+apache&nbsp;commons math (18)         <span 
style="color:purple">██████████████████</span>▏
+apache&nbsp;ignite (17)               <span 
style="color:purple">█████████████████</span>▏
+apache&nbsp;spark (14)                <span 
style="color:purple">█████████████</span>▏
+apache&nbsp;mxnet (12)                <span 
style="color:purple">████████████</span>▏
+apache&nbsp;wayang (11)               <span 
style="color:purple">███████████</span>▏
+apache&nbsp;age (11)                  <span 
style="color:purple">███████████</span>▏
+eclipse&nbsp;deeplearning4j (8)       <span style="color:blue">████████</span>▏
+apache&nbsp;commons collections (7)   <span 
style="color:purple">███████</span>▏
+apache&nbsp;commons csv (6)           <span style="color:purple">██████</span>▏
 
 </pre>
 ++++
@@ -563,6 +568,7 @@ We'll use our regex to find project names and store the 
information in our facet
 Lucene creates a special _taxonomy_ index for indexing facet information.
 We'll also enable that.
 
+// LuceneFacets.groovy
 [source,groovy]
 ----
 var analyzer = new ProjectNameAnalyzer()
@@ -626,7 +632,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons 
cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
 groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
-groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:2, apache&nbsp;commons:1, apache&nbsp;commons math:2]
+groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:3, apache&nbsp;commons:4, apache&nbsp;commons math:2, 
apache&nbsp;spark:1]
 groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
@@ -665,16 +671,10 @@ var projects = new 
TaxonomyFacetIntAssociations('$projectHitCounts', taxonReader
 var hitData = projects.getTopChildren(topN, 'projectHitCounts').labelValues
 
 println "\nFrequency of total hits mentioning a project (top $topN):"
-hitData.each { m ->
-    var label = "$m.label ($m.value)"
-    println "${label.padRight(32)} ${bar(m.value, 0, 50, 50)}"
-}
+display(hitData.collectEntries{ lv -> [lv.label, lv.value] }, 50)
 
 println "\nFrequency of documents mentioning a project (top $topN):"
-hitData.each { m ->
-    var label = "$m.label ($m.count)"
-    println "${label.padRight(32)} ${bar(m.count * 2, 0, 20, 20)}"
-}
+display(hitData.collectEntries{ lv -> [lv.label, lv.count] }, 20, 2)
 ----
 
 When running this we can see the frequencies for the total hits and number of 
files:
@@ -683,25 +683,22 @@ When running this we can see the frequencies for the 
total hits and number of fi
 ++++
 <pre>
 Frequency of total hits mentioning a project (top 5):
-eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
-apache&nbsp;commons math (18)         ██████████████████▏
-apache&nbsp;ignite (17)               █████████████████▏
-apache&nbsp;spark (13)                █████████████▏
-apache&nbsp;mxnet (12)                ████████████▏
+eclipse&nbsp;collections (50)         <span 
style="color:blue">██████████████████████████████████████████████████</span>▏
+apache&nbsp;commons math (18)         <span 
style="color:purple">██████████████████</span>▏
+apache&nbsp;ignite (17)               <span 
style="color:purple">█████████████████</span>▏
+apache&nbsp;spark (14)                <span 
style="color:purple">██████████████</span>▏
+apache&nbsp;mxnet (12)                <span 
style="color:purple">████████████</span>▏
 
 Frequency of documents mentioning a project (top 5):
-eclipse&nbsp;collections (10)         ████████████████████▏
-apache&nbsp;commons math (7)          ██████████████▏
-apache&nbsp;spark (5)                 ██████████▏
-apache&nbsp;ignite (4)                ████████▏
-apache&nbsp;mxnet (1)                 ██▏
+eclipse&nbsp;collections (10)         <span 
style="color:blue">████████████████████</span>▏
+apache&nbsp;commons math (7)          <span 
style="color:purple">██████████████</span>▏
+apache&nbsp;ignite (4)                <span 
style="color:purple">████████</span>▏
+apache&nbsp;spark (6)                 <span 
style="color:purple">████████████</span>▏
+apache&nbsp;mxnet (1)                 <span style="color:purple">██</span>▏
 
 </pre>
 ++++
 
-NOTE: At the time of writing, there is a bug in sorting for the second of 
these graphs.
-A https://github.com/apache/lucene/issues/14008[fix] is coming.
-
 Now, the taxonomy information about document frequency is for the top hits 
scored using the number of hits.
 One of our other facets (`projectFileCounts`) tracks document frequency 
independently.
 Let's look at how we can query that information:
@@ -724,7 +721,7 @@ Frequency of documents mentioning a project (top 5):
 dim=projectFileCounts path=[] value=-1 childCount=27
   eclipse&nbsp;collections (10)
   apache&nbsp;commons math (7)
-  apache&nbsp;spark (5)
+  apache&nbsp;spark (6)
   apache&nbsp;ignite (4)
   apache&nbsp;commons csv (4)
 
@@ -764,7 +761,7 @@ dim=projectNameCounts path=[] value=-1 childCount=2
 Frequency of documents mentioning a project with path [apache] (top 5):
 dim=projectNameCounts path=[apache] value=-1 childCount=18
   commons (16)
-  spark (5)
+  spark (6)
   ignite (4)
   wayang (3)
   flink (2)
@@ -805,6 +802,7 @@ Let's have a look at what the code for that scenario could 
look like.
 
 First, we'll do indexing with the `StandardAnalyzer`.
 
+// LuceneWithStandardAnalyzer.groovy
 [source,groovy]
 ----
 var analyzer = new StandardAnalyzer()

Reply via email to