[
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216223#comment-16216223
]
Gus Heck commented on SOLR-11487:
---------------------------------
Thx for the review Dave.
I'll start in on some of the fix-ups, here's some of the whys for what I did, I
can probably be talked out of anything here, but this is what I was thinking...
Let me know what you think.
*Fix List*
* Will add test for removal of alias case and fix it.
* docs: yes of course good point :)
* names: sure sounds fine :)
* I had initially started supporting lists and then decided to axe that until
discussion. I will move back to <String,String> from <String,Object>..
----
*Warnings*: this gets difficult because we don't really *have* type safety
anymore...
We have a Map<String,X> with two keys "collection" and "collection_metadata"
the values for these two keys don't match. The former is Map<String,String> and
the later is a Map<String,Map<String,String>> String and Map are not
convertible types so one use case or the other wont compile... unless you back
off from type safety. To achieve type safety we either need to keep two
separate maps or we need to be serializing an actual object hierarchy rather
than collection classes.
----
*The ZkStateReader field* is necessary because we need to get the metadata back
to zookeeper. Based on your earlier response we won't have a collections API
call to set metadata, so we need to have a ZkStateReader somehow or nothing
gets written to zookeeper. In the case of cloneWithCollectionAlias, yeah that
can be eliminated there, good catch. I can add an overload for the signature
you suggest as well for the case where both are to be updated at the same time,
but WRT removing ZkStateReader entirely, see comments about immutability
below...
----
*Immutability* is nice of course, and great for things that are immutable, or
only held for a short duration, but once you have a long held reference and the
underlying data is actually mutable, it gets difficult to be sure nobody is
retaining a reference a stale copy every time a change is made... A little
digging reveals that CoreContainer has a ZkContainer which has a ZkController,
which has a ZkStateReader, which therefore holds onto an immutable copy of
Aliases for a difficult to determine time frame....
The existing cloning/immutability scheme therefore worries me. It seems like it
would make more sense for Aliases to function as a wrapper around the
(fundamentally mutable) json in zookeeper. If we never want to know if there
was a change after we get our initial copy, and we never give away a reference
to the copy we got, and we never retain that copy after we make an update we
could have immutable copies... hard to make those stick however. It might be
that we want a snapshot at the start of a request that doesn't change for the
duration request (I can imagine that getting funky fast), but the long held
versions need to be mutable I think... maybe a mutable super class and an
immutable subclass that throws UnsupportedOperation on mutators?
----
The *Collections.EMPTY_MAP* came up because I can't put anything into an empty
map, and need to test if that's what we currently have or if we have a regular
map that I can add stuff to. Collections.emptyMap() is not required to return
the same instance, or any particular implementation class, so in order to test
for it and not be subject to breakage in odd VM's or future versions (or past
versions?) I have to use the single unique instance in Collections.EMPTY_MAP.
I've grown slightly unsure as to whether that if block is still necessary, a
possible hold over from early versions of this code, so I might give a shot at
eliminating it and go back to Collections.emptyMap().
----
I *moved CRUD stuff* to ZkStateReader so I didn't have to duplicate it in
Aliases to get metadata written back to zookeeper. Also it feels reasonable to
have a ZK class doing the Zk CRUD rather than having that code live in a
command class that grabs ZkClient from ZkStateReader and writes the data
directly itself... (law of Demeter etc). This way, the command does command
type stuff like identify the data to be written and validation to be sure we
really do want to write the data and then hands the data off to the thing that
knows how to work with zookeeper data so it can do the actual writing...
Controller & service/DAO stuff. The present code seems like the
controller/action in a web app firing up a JDBC connection directly...
----
*TimeOut*... initially I was going to copy it over verbatim since it was in
core and core is not available in solrj (when I moved the CRUD to
ZkStateReader) and then I realized it could be improved so I improved it. I
think perhaps this timeout and the one in Core could be reconciled and moved to
a commonly available location to facilitate re-use, but that seems like a
separate endeavor not part of this ticket. The previous checkForAlias() method
was running a tight loop with no sleep inside the loop for up to 30 seconds
(evidently waiting for another thread to call updateAliases() to replace our
"immutable" aliases with a copy that had the alias that we were meant to add...
This probably soaks up an entire processor core while we wait. Since the use
case is to block the current thread until a condition is observed, an api that
forces you to set a sleep delay seems beneficial.
I could possibly also just write the poling loop directly without the timeout
class... but I was just following from the code I had brought over from the
command class in this case, perhaps a bit too blindly.
> Collection Alias metadata for time partitioned collections
> ----------------------------------------------------------
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a
> series of collections of a time series. We'll need to store some metadata
> about these time series collections, such as which field of the document
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper)
> will break_. Although if it's configured with an HTTP Solr URL then it would
> not break. There's also some read/write hassle to worry about -- we may need
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say,
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the
> configset in use?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]