[ 
https://issues.apache.org/jira/browse/HDDS-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854955#comment-17854955
 ] 

Tanvi Penumudy edited comment on HDDS-10970 at 6/14/24 7:26 AM:
----------------------------------------------------------------

>From the OM audits, the DELETE_KEY operation over volume: 
>hivewritevol1717048266, bucket: hivebucket1717048272, key: 
>hive_write/vectortab/delta_0000884_0000884_0000 appears to be successful on 
>*2024-06-02 20:40:56* by user: hive/vc0117.<domain> with no re-creation 
>following that.

The error came up around *2024-06-03 03:40:41* when the deleted key was being 
sourced.

>From the OM audits, when we filter-out all operations by this user: 
>hive/vc0117.<domain>, the results show multiple consecutive DELETE_KEY 
>operations during and prior to the timestamp *2024-06-02 20:40.*

They all appear to be sequential deletes under the hive_write/vectortab/ 
directory, including our missing key suffix: delta_0000884_0000884_0000:
{code:java}
[92] delta_0000892_0000892_0000
[91] delete_delta_0000891_0000891_0000
[90] delta_0000890_0000890_0000
[89] delete_delta_0000889_0000889_0000
[88] delta_0000888_0000888_0000
[87] delete_delta_0000887_0000887_0000
[86] delta_0000886_0000886_0000
[85] delete_delta_0000885_0000885_0000
[84] delta_0000884_0000884_0000 (our missing key) 
.......[so on].......
{code}
>From the order of deletes, it appears that a delete query from vectortab (with 
>specific conditions) was run at the time, causing the deletion of the 
>directory/key, which was later referenced in another query (resulting in the 
>error).

I believe it is safe to assume that this does not appear to be a product bug. 
Resolving the ticket for now, we may reopen the ticket as needed, thank you!


was (Author: JIRAUSER285056):
>From the OM audits, the DELETE_KEY operation over volume: 
>hivewritevol1717048266, bucket: hivebucket1717048272, key: 
>hive_write/vectortab/delta_0000884_0000884_0000 appears to be successful on 
>*2024-06-02 20:40:56* by user: hive/vc0117.<domain> with no re-creation 
>following that.

The error came up around *2024-06-03 03:40:41* when the deleted key was being 
sourced.

>From the OM audits, when we filter-out all operations by this user: 
>hive/vc0117.<domain>, the results show multiple consecutive DELETE_KEY 
>operations during and prior to the timestamp *2024-06-02 20:40.*

They all appear to be sequential deletes under the hive_write/vectortab/ 
directory, including our missing key suffix: delta_0000884_0000884_0000:
{code:java}
[92] delta_0000892_0000892_0000
[91] delete_delta_0000891_0000891_0000
[90] delta_0000890_0000890_0000
[89] delete_delta_0000889_0000889_0000
[88] delta_0000888_0000888_0000
[87] delete_delta_0000887_0000887_0000
[86] delta_0000886_0000886_0000
[85] delete_delta_0000885_0000885_0000
[84] delta_0000884_0000884_0000 (our missing key) 
.......[so on].......
{code}
>From the order of deletes, it appears that a delete query from vectortab (with 
>specific conditions) was run at the time, causing in the deletion of the 
>directory/key, which was later referenced in another query (resulting in the 
>error).

I believe it is safe to assume that this does not appear to be a product bug. 
Resolving the ticket for now, we may reopen the ticket as needed, thank you!

> FileNotFoundException encountered while running hive-write in long-running 
> setup
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-10970
>                 URL: https://issues.apache.org/jira/browse/HDDS-10970
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Jyotirmoy Sinha
>            Assignee: Tanvi Penumudy
>            Priority: Major
>
> FileNotFoundException encountered while running hive-write in long-running 
> setup
> Error stacktrace -
> {code:java}
> E         ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1717317065380_0872_1_00, diagnostics=[Vertex 
> vertex_1717317065380_0872_1_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: vectortab initializer failed, 
> vertex=vertex_1717317065380_0872_1_00 [Map 1], java.lang.RuntimeException: 
> ORC split generation failed with exception: java.io.FileNotFoundException: 
> Unable to get file status: volume: hivewritevol1717048266 bucket: 
> hivebucket1717048272 key: hive_write/vectortab/delta_0000884_0000884_0000
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1853)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1940)
> E               at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:543)
> E               at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:851)
> E               at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:289)
> E               at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:203)
> E               at java.security.AccessController.doPrivileged(Native Method)
> E               at javax.security.auth.Subject.doAs(Subject.java:422)
> E               at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> E               at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:196)
> E               at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:177)
> E               at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:171)
> E               at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> E               at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
> E               at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
> E               at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
> E               at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> E               at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> E               at java.lang.Thread.run(Thread.java:748)
> E         Caused by: java.util.concurrent.ExecutionException: 
> java.io.FileNotFoundException: Unable to get file status: volume: 
> hivewritevol1717048266 bucket: hivebucket1717048272 key: 
> hive_write/vectortab/delta_0000884_0000884_0000
> E               at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> E               at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785)
> E               ... 18 more
> E         Caused by: java.io.FileNotFoundException: Unable to get file 
> status: volume: hivewritevol1717048266 bucket: hivebucket1717048272 key: 
> hive_write/vectortab/delta_0000884_0000884_0000
> E               at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> E               at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785)
> E               ... 18 more
> E         Caused by: java.io.FileNotFoundException: Unable to get file 
> status: volume: hivewritevol1717048266 bucket: hivebucket1717048272 key: 
> hive_write/vectortab/delta_0000884_0000884_0000
> E               at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.listStatus(BasicRootedOzoneClientAdapterImpl.java:929)
> E               at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.listFileStatus(BasicRootedOzoneFileSystem.java:1250)
> E               at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.access$300(BasicRootedOzoneFileSystem.java:102)
> E               at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem$OzoneFileStatusIterator.<init>(BasicRootedOzoneFileSystem.java:1165)
> E               at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem$OzoneFileStatusIterator.<init>(BasicRootedOzoneFileSystem.java:1147)
> E               at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.listStatusIterator(BasicRootedOzoneFileSystem.java:1139)
> E               at 
> org.apache.hadoop.hive.common.FileUtils.listStatusIterator(FileUtils.java:1344)
> E               at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getHdfsDirSnapshots(AcidUtils.java:1525)
> E               at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1324)
> E               at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1292)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.getAcidState(OrcInputFormat.java:1256)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:1274)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.lambda$call$0(OrcInputFormat.java:1245)
> E               at java.security.AccessController.doPrivileged(Native Method)
> E               at javax.security.auth.Subject.doAs(Subject.java:422)
> E               at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:1245)
> E               at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:1210)
> E               at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> E               at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> E               at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> E               ... 3 more
> E         ] {code}
> Error observed after 93 hours of execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to