I'd like us to reflect on how we categorize issues in CHANGES.txt.  We have
these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations',
'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other
Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)

In particular, I'm often surprised at how some of us categorize New
Features or Improvements that should better be categorized as something
else.  I think the root cause of these problems may be that we don't have
JIRA categories that directly align.  Furthermore, our dev practices will
typically result in a CHANGES.txt being added out of band from the
code-review process, and thus no peer-review on ideal placement.
Furthermore the message itself is often not code reviewed but should be.
Perhaps we can simply get in the habit of adding a JIRA comment (or GH code
review) what we propose the category & issue summary should be.

Here is my attempt at a definition for _some_ of these categories.  I don't
pretend to think we all agree 100% but it's up for discussion:
========
* New Features:  A user-visible new capability.  Usually opt-in.

* Improvements:  A user-visible improvement to an existing capability that
somehow expands its ability or that which improves the behavior.  Not a
refactoring, not an optimization.

* Optimizations: Something is now more efficient.  Usually automatic (not
opt-in).

* Other:  Anything else: Refactorings, tests, build, docs, etc.  And adding
log statements.
========

I recommend the following changes to Lucene 8.5:

These are "Improvements" that I think are better categorized as
"Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all
update a
  single field to the same value. This optimization can reduce the flush
time by around
  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand,
Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant,
Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)

These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to
improve
  TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use
generalized
  LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract
class called LatLonGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class
called XYGeometry. Queries are
  executed with input objects that extend such interface. (Ignacio Vera)

Maybe this "Other" item should be  "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward,
Mike Drob)

Solr:

"New Features" that maybe should be "Improvements":
 * SOLR-13892: New "top-level" docValues join implementation (Jason
Gerlowski, Joel Bernstein)
 * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or
shapes. (Adrien Grand)

"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL
are cached by default (Mikhail Khludnev)

"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in
production (janhoy)

Thoughts?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Reply via email to