[ 
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365075#comment-14365075
 ] 

Shalin Shekhar Mangar commented on SOLR-7256:
---------------------------------------------

{quote}
In solrconfig.xml I would like to be able to provide multiple comma separated 
dataDir paths as you would in say Hadoop and have it use the space on all of 
those disks equally (assuming that every directory specified is a separate disk 
- this is how Hadoop does it).
{quote}

Okay, well maybe not solrconfig.xml but we can figure out where this 
configuration lives.

{quote}
This way we would only deploy / manage 1 replica instance per node using the 
normal tooling and it would simply follow the pre-configured solrconfig.xml to 
utilize all the different disks and space.
{quote}

Isn't that kind of thing solved by RAID configurations? I can see a case for 
shard allocation across disks but what you are describing is either 1) 
spreading index files across multiple directories 2) having micro-shards 
managed as-if it was just one core. Both are quite impractical, imo. The former 
has problems such as atomically locking the index such that only one writer is 
active and also that different index files have different sizes so spreading 
them around is a problem. The latter is just a lot of surgery of Solr internals 
which doesn't have any benefits over creating multiple shards. This sort of 
thing is easier for hadoop because the block size is fixed.

> Multiple data dirs
> ------------------
>
>                 Key: SOLR-7256
>                 URL: https://issues.apache.org/jira/browse/SOLR-7256
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 4.10.3
>         Environment: HDP 2.2 / HDP Search
>            Reporter: Hari Sekhon
>
> Request to support multiple dataDirs as indexing a large collection fills up 
> only one of many disks in modern servers (think colocating on Hadoop servers 
> with many disks).
> While HDFS is another alternative, it results in poor performance and index 
> corruption under high online indexing loads (SOLR-7255).
> While it should be possible to do multiple cores with different dataDirs, 
> that could be very difficult to manage and not humanly scale well, so I think 
> Solr should support use of multiple dataDirs natively.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to