Hello,

I'm revisiting something that I did a long time ago, and looking to tidy
things up but I think I need to add at least one bit of API to do it
(unless there's a different way to go).

I've got an application which indexes to Solr, and it records some metadata
for each collection - e.g. the last database row that was indexed, flags to
indicate whether indexing has been paused, things like that - trivially
represented as a JSON file. For some specific collections I also need to
store a XML file which contains some configuration.

For several reasons (we wanted those files to be included in backup+restore
operations from Solr, and we wanted the metadata to be wiped if a
collection was deleted and recreated) we decided that it made sense to
store the files in ZooKeeper within the nodes used by Solr, so we ended up
storing the metadata properties in the collectionprops.json, and the XML
inside the configset used by the collection (with the collection name used
to uniquely name it within a shared config). The application uses a
ZooKeeper client to directly interact with those files in ZooKeeper.

That worked well enough so far, but it would definitely be cleaner and
safer to go through Solr Admin APIs instead so that the application doesn't
need to talk to ZooKeeper or know about Solr's internals.

For the properties my first thought was that I could just use the
COLLECTIONPROP
API
<https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#collectionprop>
as I'm already storing them in collectionprops.json. However, it doesn't
have a read operation, it only has set and delete operations... Would
adding a GET to the v2 API be reasonable?
I see that https://issues.apache.org/jira/browse/SOLR-12224 did suggest
that would happen as part of
https://issues.apache.org/jira/browse/SOLR-15734, but it doesn't look like
there's an active issue for it. I'd be happy to have a look at implementing
it in a few weeks if it is something that's generally wanted.

And for the XML file, I could use the Configset API
<https://solr.apache.org/guide/solr/latest/configuration-guide/configsets-api.html#configsets-upload>
to upload it, but again there's no API to download the file (nor even the
configset as a whole). Would adding APIs to fetch the content of a
configset be generally useful (or objectionable)?

Though configsets aren't the best fit for this anyway, the file really
belongs to an individual collection's configuration rather than a shareable
configset. Instead with a read API for collection properties, I could
base64 encode my XML and store it as a collection property instead. The
downside to that would be that it makes the collectionprops.json bigger
than it needs to be and lead to a lot of churn in ZooKeeper. The XML is
written once while the metadata properties are written many times, and I
don't want Solr to care about it either... it doesn't need to be watched
etc.

So in many ways, what I really would like is to have a generic API to store
and retrieve "stuff" for a collection (which is already achievable using
the ZK client), and have it included in the backup+restore of a collection
(which doesn't happen currently AFAIK, only specific nodes are restored).
i.e. a CRUD API that just delegates straight to the ZooKeeper client and
stores things under the collection (under a well known node).
Perhaps that is too niche and nobody else would need it? Or maybe I should
solve the whole problem in a different way in the first place?

Regards,
Colvin

Reply via email to