[ 
https://issues.apache.org/jira/browse/HDDS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Huang updated HDDS-8289:
-------------------------------
    Description: 
hive: ======= Hive 3.1.3000.7.1.8.11-3 Git 
git://centos7-builds-9d1tl/grid/0/jenkins/workspace/workspace/CDH-parallel-centos7/SOURCES/hive
 -r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by jenkins on Wed Dec 7 
16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544 
Ozone: ======= Using HDDS 1.3.0.718.1.0-b21 Source code repository 
[email protected]:CDH/ozone.git -r 
df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by jenkins on 
2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source with 
checksum 1a9d4a2bf6acc652de6c29241163a63f

 
*Context:*
Some queries run in Ozone were much slower than HDFS (1 TB tpcds orc format).

*Issue:*
getSplits in AM is super slow causing the slowness. Though getSplits run in 
multithreaded mode for ORC, it is still slower due to internal "listStatus" 
calls. Yet to get details on why listStatus calls are slow from Ozone side.

Attaching AM logs here for later reference.

Attaching AM log snippet for Q44; spent 36+ seconds in getSplits:
Q44:
{noformat}
2023-03-26 21:10:04,421 [INFO] [App Shared Pool - #25] |orc.OrcInputFormat|: 
getSplits finished (#splits: 44). duration: 36595 ms
2023-03-26 21:10:04,594 [INFO] [App Shared Pool - #24] |orc.OrcInputFormat|: 
getSplits finished (#splits: 1917). duration: 36727 ms
2023-03-26 21:10:04,698 [INFO] [App Shared Pool - #27] |orc.OrcInputFormat|: 
getSplits finished (#splits: 44). duration: 36816 ms
2023-03-26 21:10:04,855 [INFO] [App Shared Pool - #26] |orc.OrcInputFormat|: 
getSplits finished (#splits: 1917). duration: 36907 ms
2023-03-26 21:10:05,051 [INFO] [App Shared Pool - #28] |orc.OrcInputFormat|: 
getSplits finished (#splits: 1917). duration: 37070 ms                
{noformat}

  was:hive: ======= Hive 3.1.3000.7.1.8.11-3 Git 
git://centos7-builds-9d1tl/grid/0/jenkins/workspace/workspace/CDH-parallel-centos7/SOURCES/hive
 -r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by jenkins on Wed Dec 7 
16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544 
Ozone: ======= Using HDDS 1.3.0.718.1.0-b21 Source code repository 
[email protected]:CDH/ozone.git -r 
df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by jenkins on 
2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source with 
checksum 1a9d4a2bf6acc652de6c29241163a63f


> get splits in tpcds queries are way higher (10-30+ seconds) causing slowness 
> on FSO bucket
> ------------------------------------------------------------------------------------------
>
>                 Key: HDDS-8289
>                 URL: https://issues.apache.org/jira/browse/HDDS-8289
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: George Huang
>            Priority: Critical
>
> hive: ======= Hive 3.1.3000.7.1.8.11-3 Git 
> git://centos7-builds-9d1tl/grid/0/jenkins/workspace/workspace/CDH-parallel-centos7/SOURCES/hive
>  -r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by jenkins on Wed Dec 7 
> 16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544 
> Ozone: ======= Using HDDS 1.3.0.718.1.0-b21 Source code repository 
> [email protected]:CDH/ozone.git -r 
> df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by jenkins on 
> 2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source 
> with checksum 1a9d4a2bf6acc652de6c29241163a63f
>  
> *Context:*
> Some queries run in Ozone were much slower than HDFS (1 TB tpcds orc format).
> *Issue:*
> getSplits in AM is super slow causing the slowness. Though getSplits run in 
> multithreaded mode for ORC, it is still slower due to internal "listStatus" 
> calls. Yet to get details on why listStatus calls are slow from Ozone side.
> Attaching AM logs here for later reference.
> Attaching AM log snippet for Q44; spent 36+ seconds in getSplits:
> Q44:
> {noformat}
> 2023-03-26 21:10:04,421 [INFO] [App Shared Pool - #25] |orc.OrcInputFormat|: 
> getSplits finished (#splits: 44). duration: 36595 ms
> 2023-03-26 21:10:04,594 [INFO] [App Shared Pool - #24] |orc.OrcInputFormat|: 
> getSplits finished (#splits: 1917). duration: 36727 ms
> 2023-03-26 21:10:04,698 [INFO] [App Shared Pool - #27] |orc.OrcInputFormat|: 
> getSplits finished (#splits: 44). duration: 36816 ms
> 2023-03-26 21:10:04,855 [INFO] [App Shared Pool - #26] |orc.OrcInputFormat|: 
> getSplits finished (#splits: 1917). duration: 36907 ms
> 2023-03-26 21:10:05,051 [INFO] [App Shared Pool - #28] |orc.OrcInputFormat|: 
> getSplits finished (#splits: 1917). duration: 37070 ms                
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to