Re: Solr Shard Splitting Issue with 60 GB index data
Hi Anshum, Thanks for the reply. But I've stuck in some other way. I'm able to split shard successfully with less amount of index data, i.e. less than 45GB. But my actual data volume is more than 70 GB, and when I'm trying to split the shard in the same process, it's not completed successfully. I've tried several times and with one shard (*70 GB Data*) only. I've monitored the state.json for about 24 hrs but still the master shard was in active state. *I think there is some issue happening while the shard data volume is high like more than 70GB. * So can you confirm that the shard splitting works fine for large amount of data as well, like this kind of load testing is tested? Or is there any configuration changes required like timeout value or something I need to take care of. I've already sent you the logs and all my Solr machine configurations. Please suggest. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shard-Splitting-Issue-tp4314145p4319149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Shard Splitting Issue
I see a successful completion of the request in the logs here: 2017-01-18 14:43:55.439 INFO (OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00) [ ] o.a.s.c.o.SliceMutator Update shard state invoked for collection: collection1 with message: { "shard1":"inactive", "collection":"collection1", "shard1_1":"active", "operation":"updateshardstate", "shard1_0":"active"} 2017-01-18 14:43:55.439 INFO (OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00) [ ] o.a.s.c.o.SliceMutator Update shard state shard1 to inactive 2017-01-18 14:43:55.439 INFO (OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00) [ ] o.a.s.c.o.SliceMutator Update shard state shard1_1 to active 2017-01-18 14:43:55.439 INFO (OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00) [ ] o.a.s.c.o.SliceMutator Update shard state shard1_0 to active I think you might be looking at the admin UI to figure out the state of the shards, and that might still be broken. Can you confirm the state of the shard from the CLUSTERSTATUS API ? Also, you shouldn't be invoking SPLITSHARD on the same shard multiple times like you did when the non-async version failed. -Anshum On Thu, Jan 19, 2017 at 6:12 AM ekta <ekta.bhalw...@e-arc.com> wrote: > Hi Anshum, > > Thanks for the reply. > > I had the copy of data that i was experimenting on, and anyways i was doing > it later too, after i posted the mail. Some points i want to let you know:- > > 1. This time i did not change the state of state.json. > 2. Rest,I did the same steps as above and still the data got frozen to 24GB > in both shards(my parent shard had -60GB). > 3. Still, the state.json is showing > 3.1 Parent - Active > 3.2 Child - Construction > 4.Yeah i do have logs , i am attaching the file with mail. Please check it > out. > 5. I did shard splitting by this command > > " > http://10.1.1.78:4983/solr/admin/collections?action=SPLITSHARD=collection1=shard1 > " > in browser, and i got Timeout Exception in browser. I am attaching the file > which contains, what the browser displayed. > 6. The Details of the system(Amazon EC2 Instances) for which i am doing > above steps is: > 6.1 30GB RAM > 6.2 4 cores > 6.3 250 GB drive > 7. Lastly , i googled about the timeout exception that i got, i found some > reply by you on the post about the same, where you mentioned to issue the > spilt shard command asynchronously , i tried with that too. As a result no > doubt i did not got time out exception from browser but , rest all was same > as mentioned above. > > Please tell me if any further details are required. solr.log > <http://lucene.472066.n3.nabble.com/file/n4314813/solr.log> > Browser_result.txt > <http://lucene.472066.n3.nabble.com/file/n4314813/Browser_result.txt> > > > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Shard-Splitting-Issue-tp4314145p4314813.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr Shard Splitting Issue
Hi Anshum, Thanks for the reply. I had the copy of data that i was experimenting on, and anyways i was doing it later too, after i posted the mail. Some points i want to let you know:- 1. This time i did not change the state of state.json. 2. Rest,I did the same steps as above and still the data got frozen to 24GB in both shards(my parent shard had -60GB). 3. Still, the state.json is showing 3.1 Parent - Active 3.2 Child - Construction 4.Yeah i do have logs , i am attaching the file with mail. Please check it out. 5. I did shard splitting by this command "http://10.1.1.78:4983/solr/admin/collections?action=SPLITSHARD=collection1=shard1; in browser, and i got Timeout Exception in browser. I am attaching the file which contains, what the browser displayed. 6. The Details of the system(Amazon EC2 Instances) for which i am doing above steps is: 6.1 30GB RAM 6.2 4 cores 6.3 250 GB drive 7. Lastly , i googled about the timeout exception that i got, i found some reply by you on the post about the same, where you mentioned to issue the spilt shard command asynchronously , i tried with that too. As a result no doubt i did not got time out exception from browser but , rest all was same as mentioned above. Please tell me if any further details are required. solr.log <http://lucene.472066.n3.nabble.com/file/n4314813/solr.log> Browser_result.txt <http://lucene.472066.n3.nabble.com/file/n4314813/Browser_result.txt> -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shard-Splitting-Issue-tp4314145p4314813.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Shard Splitting Issue
Hi Ekta, Rule#1 - You shouldn't forcefully and manually change the state unless you know what you're doing and have performed all the checks. Seems like the child shards were still getting created i.e. copying the entire index from the parent shard when you manually switched. One of the reasons for this could be that you ran out of disk on the leader node. You might be able to get more information about that by looking at the logs, and information from any cluster management tool that you might be using that tracks metrics like disk usage etc. The shard split, actually creates 2 subshards on the same node as the original parent, practically duplicating the data in a separate set of index directories. Did you send more updates while this was going on? You still might be able to restore things from the original parent by changing the clusterstate to how it was before you issues SPLITSHARD (with only the parent shard - in active ). Before you do anything, I'd suggest you copy the indexes. If you have any error logs, it would be good to share them here on the list (if you can). Make sure you upload them to a file sharing service instead of sending those as attachments to the mailing list. -Anshum On Mon, Jan 16, 2017 at 2:33 AM Ekta Bhalwarawrote: > Hi , > > I tried Shard Splitting with 6.3 version of Solr,with the following steps:- > > Step 1 : > > I have issued > "collections?action=SPLITSHARD==shard1" > > Step 2 : > > I noticed 2 child shard got created shard1_0 and shard1_1 > > step 3 : > > After complete step 2, still I see > > shard1 state : active > > AND > > shard1_0 and shard1_1 : > > state:construction > > I checked the state in state.json for nearly 48 hours , but the data > copying got frozen up while reaching a certain range(for example:- 60GB > data in parent node, after splitting, both child nodes got 24GB data, > then data copying into child not got stopped). The state.json file was > not changing further. > > Moreover, when i manually changed state.json (parent node to inactive > from active and child node to active from construction) i suffered a > huge loss of data.Please look into the issue from your side and let me > know in case of any further information is required. > > > -- > > Thanks & Regards > Ekta > >
Solr Shard Splitting Issue
Hi , I tried Shard Splitting with 6.3 version of Solr,with the following steps:- Step 1 : I have issued "collections?action=SPLITSHARD==shard1" Step 2 : I noticed 2 child shard got created shard1_0 and shard1_1 step 3 : After complete step 2, still I see shard1 state : active AND shard1_0 and shard1_1 : state:construction I checked the state in state.json for nearly 48 hours , but the data copying got frozen up while reaching a certain range(for example:- 60GB data in parent node, after splitting, both child nodes got 24GB data, then data copying into child not got stopped). The state.json file was not changing further. Moreover, when i manually changed state.json (parent node to inactive from active and child node to active from construction) i suffered a huge loss of data.Please look into the issue from your side and let me know in case of any further information is required. -- Thanks & Regards Ekta