[jira] [Commented] (HDFS-15315) IOException on close() when using Erasure Coding

Anshuman Singh (Jira) Thu, 30 Apr 2020 11:03:15 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096832#comment-17096832
 ]


Anshuman Singh commented on HDFS-15315:
---------------------------------------

There are 3 datanodes and 1 namenode. All the nodes remain alive. It is always 
reproducible.

When I start indexing 100 documents per second on Solr, it works fine for 4-5 
batches but then it starts failing.

It is always reproducible if I start with 10000 documents committing at once.

*These are the logs on hdfs namenode:*

2020-04-30 21:38:10,081 INFO namenode.FSNamesystem 
(FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* 
blk_-9223372036854775584_10
14 is COMMITTED but not COMPLETE(numNodes= 3 >= minimum = 2) in file 
/XOR21EC/dummy/core_node2/data/index/_0_Lucene70_0.dvd
2020-04-30 21:38:10,485 INFO namenode.FSNamesystem 
(FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* 
blk_-9223372036854775584_10
14 is COMMITTED but not COMPLETE(numNodes= 3 >= minimum = 2) in file 
/XOR21EC/dummy/core_node2/data/index/_0_Lucene70_0.dvd
2020-04-30 21:38:11,287 INFO namenode.FSNamesystem 
(FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* 
blk_-9223372036854775584_10
14 is COMMITTED but not COMPLETE(numNodes= 3 >= minimum = 2) in file 
/XOR21EC/dummy/core_node2/data/index/_0_Lucene70_0.dvd
2020-04-30 21:38:12,890 INFO namenode.FSNamesystem 
(FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* 
blk_-9223372036854775584_10
14 is COMMITTED but not COMPLETE(numNodes= 3 >= minimum = 2) in file 
/XOR21EC/dummy/core_node2/data/index/_0_Lucene70_0.dvd
2020-04-30 21:38:16,094 INFO namenode.FSNamesystem 
(FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* 
blk_-9223372036854775584_10
14 is COMMITTED but not COMPLETE(numNodes= 3 >= minimum = 2) in file 
/XOR21EC/dummy/core_node2/data/index/_0_Lucene70_0.dvd
2020-04-30 21:38:22,497 INFO namenode.FSNamesystem 
(FSNamesystem.java:checkBlocksComplete(2912)) - BLOCK* 
blk_-9223372036854775584_10
14 is COMMITTED but not COMPLETE(numNodes= 3 >= minimum = 2) in file 
/XOR21EC/dummy/core_node2/data/index/_0_Lucene70_0.dvd

*Solr stack trace:*

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
 at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:670)
 at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:684)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1599)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
 at 
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:982)
 at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971)
 at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348)
 at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284)
 at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234)
 ... 68 more
Caused by: java.io.IOException: Unable to close file because the last block 
does not have enough number of replicas.
 at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2519)
 at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2480)
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2445)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
 at 
org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:70)
 at org.apache.lucene.util.IOUtils.close(IOUtils.java:88)
 at org.apache.lucene.util.IOUtils.close(IOUtils.java:76)
 at 
org.apache.lucene.codecs.lucene70.Lucene70DocValuesConsumer.close(Lucene70DocValuesConsumer.java:97)
 at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$ConsumerAndSuffix.close(PerFieldDocValuesFormat.java:92)
 at org.apache.lucene.util.IOUtils.close(IOUtils.java:88)
 at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.close(PerFieldDocValuesFormat.java:234)
 at org.apache.lucene.util.IOUtils.close(IOUtils.java:88)
 at org.apache.lucene.util.IOUtils.close(IOUtils.java:76)
 at 
org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:266)
 at 
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:133)
 at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:470)
 at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:554)
 at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:719)
 at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3204)
 at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3449)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3414)
 at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:701)
 at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:93)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1929)
 at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1905)
 at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
 at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:281)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:188)
 ... 42 more

> IOException on close() when using Erasure Coding
> ------------------------------------------------
>
>                 Key: HDFS-15315
>                 URL: https://issues.apache.org/jira/browse/HDFS-15315
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: 3.1.1, hdfs
>    Affects Versions: 3.1.1
>         Environment: XOR-2-1-1024k policy on hadoop 3.1.1 with 3 datanodes
>            Reporter: Anshuman Singh
>            Priority: Major
>
> When using Erasure Coding policy on a directory, the replication factor is 
> set to 1. Solr fails in indexing documents with error - _java.io.IOException: 
> Unable to close file because the last block does not have enough number of 
> replicas._ It works fine without EC (with replication factor as 3.) It seems 
> to be identical to this issue. [ 
> https://issues.apache.org/jira/browse/HDFS-11486|https://issues.apache.org/jira/browse/HDFS-11486]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15315) IOException on close() when using Erasure Coding

Reply via email to