Andrew Kettmann created SOLR-13563:
--------------------------------------

             Summary: SPLITSHARD - Using LINK method fails on disk usage checks
                 Key: SOLR-13563
                 URL: https://issues.apache.org/jira/browse/SOLR-13563
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: AutoScaling, SolrCloud
    Affects Versions: 7.7.2
            Reporter: Andrew Kettmann
         Attachments: disk_check.patch

Raised this on the mailing list and was told to open an issue, copy/pasting the 
context here:

 

Using Solr 7.7.2 Docker image, testing some of the new autoscale features, huge 
fan so far. Tested with the link method on a 2GB core and found that it took 
less than 1MB of additional space. Filled the core quite a bit larger, 12GB of 
a 20GB PVC, and now splitting the shard fails with the following error message 
on my overseer:

 

 

 
{code:java}
2019-06-18 16:27:41.754 ERROR 
(OverseerThreadFactory-49-thread-5-processing-n:10.0.192.74:8983_solr) 
[c:test_autoscale s:shard1  ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test_autoscale 
operation: splitshard
failed:org.apache.solr.common.SolrException: not enough free disk space 
to perform index split on node 10.0.193.23:8983_solr, required: 
23.35038321465254, available: 7.811378479003906


    at 
org.apache.solr.cloud.api.collections.SplitShardCmd.checkDiskSpace(SplitShardCmd.java:567)


    at 
org.apache.solr.cloud.api.collections.SplitShardCmd.split(SplitShardCmd.java:138)


    at 
org.apache.solr.cloud.api.collections.SplitShardCmd.call(SplitShardCmd.java:94)


    at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:294)


    at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)


    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)


    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)


    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)


    at java.base/java.lang.Thread.run(Thread.java:834)
{code}
 

 

I attempted sending the request to the node itself to see if it did anything 
different, but no luck. My parameters are (Note Python formatting as that is my 
language of choice):

 
{code:java}
splitparams = {'action':'SPLITSHARD',
               'collection':'test_autoscale',
               'shard':'shard1',
               'splitMethod':'link',
               'timing':'true',
               'async':'shardsplitasync'}{code}
 

 

And this is confirmed by the log message from the node itself:

 
{code:java}
2019-06-18 16:27:41.730 INFO  
(qtp1107530534-16) [c:test_autoscale   ] o.a.s.s.HttpSolrCall [admin] 
webapp=null path=/admin/collections 
params={async=shardsplitasync&timing=true&action=SPLITSHARD&collection=test_autoscale&shard=shard1&splitMethod=link}
status=0 QTime=20{code}
 

 

While it is true I do not have enough space if I were using the rewrite method, 
the link method on a 2GB core used an additional less than 1MB of space. Is 
there something I am missing here? is there an option to disable the disk space 
check that I need to pass? I can't find anything in the documentation at this 
point.

 

--

 

After this initial email, I found the issue and compiled with the attached 
patch and running the modification on the overseer only resolved the issue, as 
the overseer is what runs the check.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to