[
https://issues.apache.org/jira/browse/HDDS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
George Huang updated HDDS-8289:
-------------------------------
Description:
hive: ======= Hive 3.1.3000.7.1.8.11-3 Git
git://xxxxxxx-xxxxxx-xxxxx/xxxx/0/jenkins/workspace/xxxxxxxxx/XXX-xxxxxxxx-xxxxxxx/XXXXXXX/hive
-r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by xxxxxxx on Wed Dec 7
16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544
Ozone: ======= Using HDDS 1.x.x.xxx.x.x-b21 Source code repository
[email protected]:XXX/ozone.git -r
df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by xxxxxxx on
2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source with
checksum 1a9d4a2bf6acc652de6c29241163a63f
*Context:*
Some queries run in Ozone were much slower than HDFS (1 TB tpcds orc format).
*Issue:*
getSplits in AM is super slow causing the slowness. Though getSplits run in
multithreaded mode for ORC, it is still slower due to internal "listStatus"
calls. Yet to get details on why listStatus calls are slow from Ozone side.
Attaching AM logs here for later reference.
Attaching AM log snippet for Q44; spent 36+ seconds in getSplits:
Q44:
{noformat}
2023-03-26 21:10:04,421 [INFO] [App Shared Pool - #25] |orc.OrcInputFormat|:
getSplits finished (#splits: 44). duration: 36595 ms
2023-03-26 21:10:04,594 [INFO] [App Shared Pool - #24] |orc.OrcInputFormat|:
getSplits finished (#splits: 1917). duration: 36727 ms
2023-03-26 21:10:04,698 [INFO] [App Shared Pool - #27] |orc.OrcInputFormat|:
getSplits finished (#splits: 44). duration: 36816 ms
2023-03-26 21:10:04,855 [INFO] [App Shared Pool - #26] |orc.OrcInputFormat|:
getSplits finished (#splits: 1917). duration: 36907 ms
2023-03-26 21:10:05,051 [INFO] [App Shared Pool - #28] |orc.OrcInputFormat|:
getSplits finished (#splits: 1917). duration: 37070 ms
{noformat}
was:
hive: ======= Hive 3.1.3000.7.1.8.11-3 Git
git://centos7-builds-9d1tl/xxxx/0/jenkins/workspace/xxxxxxxxx/CDH-parallel-centos7/SOURCES/hive
-r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by xxxxxxx on Wed Dec 7
16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544
Ozone: ======= Using HDDS 1.3.0.718.1.0-b21 Source code repository
[email protected]:CDH/ozone.git -r
df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by xxxxxxx on
2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source with
checksum 1a9d4a2bf6acc652de6c29241163a63f
*Context:*
Some queries run in Ozone were much slower than HDFS (1 TB tpcds orc format).
*Issue:*
getSplits in AM is super slow causing the slowness. Though getSplits run in
multithreaded mode for ORC, it is still slower due to internal "listStatus"
calls. Yet to get details on why listStatus calls are slow from Ozone side.
Attaching AM logs here for later reference.
Attaching AM log snippet for Q44; spent 36+ seconds in getSplits:
Q44:
{noformat}
2023-03-26 21:10:04,421 [INFO] [App Shared Pool - #25] |orc.OrcInputFormat|:
getSplits finished (#splits: 44). duration: 36595 ms
2023-03-26 21:10:04,594 [INFO] [App Shared Pool - #24] |orc.OrcInputFormat|:
getSplits finished (#splits: 1917). duration: 36727 ms
2023-03-26 21:10:04,698 [INFO] [App Shared Pool - #27] |orc.OrcInputFormat|:
getSplits finished (#splits: 44). duration: 36816 ms
2023-03-26 21:10:04,855 [INFO] [App Shared Pool - #26] |orc.OrcInputFormat|:
getSplits finished (#splits: 1917). duration: 36907 ms
2023-03-26 21:10:05,051 [INFO] [App Shared Pool - #28] |orc.OrcInputFormat|:
getSplits finished (#splits: 1917). duration: 37070 ms
{noformat}
> get splits in tpcds queries are way higher (10-30+ seconds) causing slowness
> on FSO bucket
> ------------------------------------------------------------------------------------------
>
> Key: HDDS-8289
> URL: https://issues.apache.org/jira/browse/HDDS-8289
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: George Huang
> Assignee: Ritesh Shukla
> Priority: Critical
>
> hive: ======= Hive 3.1.3000.7.1.8.11-3 Git
> git://xxxxxxx-xxxxxx-xxxxx/xxxx/0/jenkins/workspace/xxxxxxxxx/XXX-xxxxxxxx-xxxxxxx/XXXXXXX/hive
> -r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by xxxxxxx on Wed Dec 7
> 16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544
> Ozone: ======= Using HDDS 1.x.x.xxx.x.x-b21 Source code repository
> [email protected]:XXX/ozone.git -r
> df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by xxxxxxx on
> 2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source
> with checksum 1a9d4a2bf6acc652de6c29241163a63f
>
> *Context:*
> Some queries run in Ozone were much slower than HDFS (1 TB tpcds orc format).
> *Issue:*
> getSplits in AM is super slow causing the slowness. Though getSplits run in
> multithreaded mode for ORC, it is still slower due to internal "listStatus"
> calls. Yet to get details on why listStatus calls are slow from Ozone side.
> Attaching AM logs here for later reference.
> Attaching AM log snippet for Q44; spent 36+ seconds in getSplits:
> Q44:
> {noformat}
> 2023-03-26 21:10:04,421 [INFO] [App Shared Pool - #25] |orc.OrcInputFormat|:
> getSplits finished (#splits: 44). duration: 36595 ms
> 2023-03-26 21:10:04,594 [INFO] [App Shared Pool - #24] |orc.OrcInputFormat|:
> getSplits finished (#splits: 1917). duration: 36727 ms
> 2023-03-26 21:10:04,698 [INFO] [App Shared Pool - #27] |orc.OrcInputFormat|:
> getSplits finished (#splits: 44). duration: 36816 ms
> 2023-03-26 21:10:04,855 [INFO] [App Shared Pool - #26] |orc.OrcInputFormat|:
> getSplits finished (#splits: 1917). duration: 36907 ms
> 2023-03-26 21:10:05,051 [INFO] [App Shared Pool - #28] |orc.OrcInputFormat|:
> getSplits finished (#splits: 1917). duration: 37070 ms
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]