Well I have created issue SOLR-4114 on the subject. Patch comming up.
Regards, Per Steffensen
Per Steffensen skrev:
Mark Miller skrev:
The Collections API was fairly rushed - so that 4.0 had something easier than
the CoreAdmin API.
Yes I see. Our collection-creation code is more sophisticated than
yours. We probably would like to migrate to the Solr Collection API
now anyway - to be using it already when features are added later.
Due to that, it has a variety of limitations:
1. It only picks instances for a collection one way - randomly from the list of
live instances. This means it's no good for multiple shards on the same
instance. You should have enough instances to satisfy numShards X
replicationFactor (although just being short on replicationFactor will
currently just use what is there)
Well I think it shuffles the list of live-nodes and the begin
assigning shard from one end. That is ok for us for now. But it will
not start over in the list of live-nodes when there are more shards
(shards * replica) than instances. This could easily be acheived,
without making a very fancy allocation algorithm
2. It randomly chooses which instances to use rather than allowing manual
specification or looking at existing cores.
A manual spec would be nice to be able to control everything if you
really want to. But you probably also want to make different built-in
shard-allocation-strategies that can be used out-of-the-box. E.g. a
"AlwaysAssignNextShardToInstanceWithFewestShardsAlready"-strategy, but
there are also other concerns that might be more interesting for
people to have build into assignment algorithms - e.g. a rack-aware
algorithm that assign replica of the same slice to instances running
on different "racks".
3. You cannot get responses of success or failure other than polling for the
expected results later.
Well we do that anyway, and will keep doing that in our own code for now.
Someone has a patch up for 3 that I hope to look at soon - others have
contributed bug fixes that will be in 4.1. We still need to add the ability to
control placement in other ways though.
I would say there are def plans, but I don't personally know exactly when I'll
find the time for it, if others don't jump in.
Well I would like to jump in with respect to making support for
running several shards of the same collection on the same instance, it
is just so damn hard to get you to commit stuff :-) and we really dont
want to have too many differences in our Solr compared to Apache Solr
(and we have enough already - SOLR-3178 etc.). It seems like this
feature with several shards on same instance is the only missing
feature of the Collection API before we can "live with it".
- Mark
Regards, Per Steffensen