[
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365075#comment-14365075
]
Shalin Shekhar Mangar commented on SOLR-7256:
---------------------------------------------
{quote}
In solrconfig.xml I would like to be able to provide multiple comma separated
dataDir paths as you would in say Hadoop and have it use the space on all of
those disks equally (assuming that every directory specified is a separate disk
- this is how Hadoop does it).
{quote}
Okay, well maybe not solrconfig.xml but we can figure out where this
configuration lives.
{quote}
This way we would only deploy / manage 1 replica instance per node using the
normal tooling and it would simply follow the pre-configured solrconfig.xml to
utilize all the different disks and space.
{quote}
Isn't that kind of thing solved by RAID configurations? I can see a case for
shard allocation across disks but what you are describing is either 1)
spreading index files across multiple directories 2) having micro-shards
managed as-if it was just one core. Both are quite impractical, imo. The former
has problems such as atomically locking the index such that only one writer is
active and also that different index files have different sizes so spreading
them around is a problem. The latter is just a lot of surgery of Solr internals
which doesn't have any benefits over creating multiple shards. This sort of
thing is easier for hadoop because the block size is fixed.
> Multiple data dirs
> ------------------
>
> Key: SOLR-7256
> URL: https://issues.apache.org/jira/browse/SOLR-7256
> Project: Solr
> Issue Type: New Feature
> Affects Versions: 4.10.3
> Environment: HDP 2.2 / HDP Search
> Reporter: Hari Sekhon
>
> Request to support multiple dataDirs as indexing a large collection fills up
> only one of many disks in modern servers (think colocating on Hadoop servers
> with many disks).
> While HDFS is another alternative, it results in poor performance and index
> corruption under high online indexing loads (SOLR-7255).
> While it should be possible to do multiple cores with different dataDirs,
> that could be very difficult to manage and not humanly scale well, so I think
> Solr should support use of multiple dataDirs natively.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]