Github user dsmiley commented on the issue:
https://github.com/apache/lucene-solr/pull/438
Docs:
`solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc`
Alternatively, perhaps leave that part to me. Or take a 1st cut at it and
I edit from there, which I'm likely to do any way. Either way.
RE the default config: I'm concerned we're not testing the default config
well. Here's what I recommend: in
solrconfig-parsing-update-processor-chains.xml. I think there should be one
date config that is intentionally identical to the `_default` configSet's
configuration of this URP (and should have a name indicating this, like
"parse-date-default-configSet"). All the other configurations of this URP in
the file should have exactly one pattern and is only there to test a particular
condition -- i.e. the effects of a locale or an interesting pattern or
something like that. I realize this recommendation would mean stomping over a
bit of the changes in a previous issue, e.g. removing
`parse-date-patterns-from-extract-contrib`, but the test code should stay the
same, I think, notwithstanding referring to a new/renamed parse-date processor
config per test.
To me, the merging of patterns in ExtractDateUtils into the default URP
patterns seems distinct enough from removing ExtractDateUtils that it ought to
occur in its own issue.
BTW I was looking at DateTimeFormatter.RFC_1123_DATE_TIME and noticed the
leading day of week is optional. We should update our pattern accordingly, and
test it's optionality. It'd be nice if we had a feature like
`<str>RFC_1123_DATE_TIME</str>` where we could refer to a well known pattern by
it's well-known name and in turn get the pre-compiled instance in
DateTimeFormatter. But that's not in scope here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]