I'd like us to reflect on how we categorize issues in CHANGES.txt. We have
these categories:
(Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations',
'Bug Fixes', 'Other'
(Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 'Other
Changes'
(I lifted these from dev-tools/scripts/addVersion.py line 215)
In particular, I'm often surprised at how some of us categorize New
Features or Improvements that should better be categorized as something
else. I think the root cause of these problems may be that we don't have
JIRA categories that directly align. Furthermore, our dev practices will
typically result in a CHANGES.txt being added out of band from the
code-review process, and thus no peer-review on ideal placement.
Furthermore the message itself is often not code reviewed but should be.
Perhaps we can simply get in the habit of adding a JIRA comment (or GH code
review) what we propose the category & issue summary should be.
Here is my attempt at a definition for _some_ of these categories. I don't
pretend to think we all agree 100% but it's up for discussion:
========
* New Features: A user-visible new capability. Usually opt-in.
* Improvements: A user-visible improvement to an existing capability that
somehow expands its ability or that which improves the behavior. Not a
refactoring, not an optimization.
* Optimizations: Something is now more efficient. Usually automatic (not
opt-in).
* Other: Anything else: Refactorings, tests, build, docs, etc. And adding
log statements.
========
I recommend the following changes to Lucene 8.5:
These are "Improvements" that I think are better categorized as
"Optimizations"
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all
update a
single field to the same value. This optimization can reduce the flush
time by around
20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand,
Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant,
Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)
These "Improvements" I think are better categorized as "Other":
* LUCENE-9109: Backport some changes from master (except StackWalker) to
improve
TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use
generalized
LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract
class called LatLonGeometry. Queries are
executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class
called XYGeometry. Queries are
executed with input objects that extend such interface. (Ignacio Vera)
Maybe this "Other" item should be "Optimization"? (not sure):
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward,
Mike Drob)
Solr:
"New Features" that maybe should be "Improvements":
* SOLR-13892: New "top-level" docValues join implementation (Jason
Gerlowski, Joel Bernstein)
* SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or
shapes. (Adrien Grand)
"Improvements" that maybe should be "Optimizations":
* SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query DSL
are cached by default (Mikhail Khludnev)
"Improvements" that maybe should be "Other":
* SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in
production (janhoy)
Thoughts?
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley