[
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054325#comment-17054325
]
Peter Vary commented on HIVE-22964:
-----------------------------------
Hi [~aditya-shah],
* I have found this for renaming the configuration key:
[https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html#addDeprecation-java.lang.String-java.lang.String-java.lang.String-]
We should check that this is working as advertised/expected, and then go ahead
and we can rename the configuration value.
* HIVE-13120: The last comment states:
{quote}Since the ORCInputformat is cached in `FetchOperator.java`, the UGI in
`Context.threadpool` thread will be userA always.
{quote}
This suggests to me that the problem was that we cached the ORCInputFormat. Do
we have any such problem here?
* MMPathInfo: We might just use 2 synchronizedList or some "Concurrent"
implementation as {{finalPaths}} and {{pathsWithFileOriginals}} parameters for
the processPathsForMmRead method, and get away without more objects. Or did you
see serious performance degradation there because of the synchronization?
Thanks for taking care of this!
Peter
> MM table split computation is very slow
> ---------------------------------------
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
> Issue Type: Improvement
> Reporter: Aditya Shah
> Assignee: Aditya Shah
> Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we
> end up doing listing on the whole table at once. This could be optimized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)