[jira] [Commented] (SOLR-7255) Index Corruption on HDFS whenever online bulk indexing (from Hive)

Hari Sekhon (JIRA) Wed, 18 Mar 2015 04:34:50 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366991#comment-14366991
 ]


Hari Sekhon commented on SOLR-7255:
-----------------------------------

Yes it was enabled, I've disabled it and re-ran the ingest which got further 
without index corruption... however the indexing speed on HDFS is so bad 
compared to local disk that the bulk ingest I'm doing that used to take 2 hours 
for 620M rows from Hive now runs for 16 hours and then fails with a broken pipe 
to the server... but that's a separate issue.

Back to this setting - I believe solr.hdfs.blockcache.write.enabled is still 
set to true by default according to this page:

https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

Default behaviour should probably be changed to false if this is buggy, then 
fixed and re-enabled when it works properly.

Is there another ticket documenting work to fix this HDFS block write cache 
corruption issue (ie should we close this jira as duplicate)?

> Index Corruption on HDFS whenever online bulk indexing (from Hive)
> ------------------------------------------------------------------
>
>                 Key: SOLR-7255
>                 URL: https://issues.apache.org/jira/browse/SOLR-7255
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.10.3
>         Environment: HDP 2.2 / HDP Search + LucidWorks hadoop-lws-job.jar
>            Reporter: Hari Sekhon
>            Priority: Blocker
>
> When running SolrCloud on HDFS and using the LucidWorks hadoop-lws-job.jar to 
> index a Hive table (620M rows) to Solr it runs for about 1500 secs and then 
> gets this exception:
> {code}Exception in thread "Lucene Merge Thread #2191" 
> org.apache.lucene.index.MergePolicy$MergeException: 
> org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual 
> header=1494817490 vs expected header=1071082519 (resource: 
> BufferedChecksumIndexInput(_r3.nvm))
>         at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:549)
>         at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:522)
> Caused by: org.apache.lucene.index.CorruptIndexException: codec header 
> mismatch: actual header=1494817490 vs expected header=1071082519 (resource: 
> BufferedChecksumIndexInput(_r3.nvm))
>         at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:136)
>         at 
> org.apache.lucene.codecs.lucene49.Lucene49NormsProducer.<init>(Lucene49NormsProducer.java:75)
>         at 
> org.apache.lucene.codecs.lucene49.Lucene49NormsFormat.normsProducer(Lucene49NormsFormat.java:112)
>         at 
> org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:127)
>         at 
> org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:108)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
>         at 
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
>         at 
> org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3951)
>         at 
> org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3913)
>         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3766)
>         at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
>         at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
> {code}
> So I deleted the whole index, re-create it and re-ran the job to send Hive 
> table contents to Solr again and it returned exactly the same exception the 
> first time after trying to send a lot of updates to Solr.
> I moved off HDFS to a normal dataDir backend and then re-indexed the full 
> table in 2 hours successfully without index corruptions.
> This implies that this is some sort of stability issue on the HDFS 
> DirectoryFactory implementation.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7255) Index Corruption on HDFS whenever online bulk indexing (from Hive)

Reply via email to