[
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052036#comment-17052036
]
Peter Vary commented on HIVE-22964:
-----------------------------------
Hi [~aditya-shah],
* I am not a big fan of renaming configuration variables. They can wreak havoc
when upgrading a cluster
* Sorry, I have missed the error handling part. My bad :(, but this highlights
why it is good practice to use try catch around only the relevant part of the
code where the exception can be thrown:
{code:java}
for (Future<MMPathInfo> pathFuture : pathFutures) {
finalPaths.addAll(pathFuture.get().getFinalPaths());
pathsWithFileOriginals.addAll(pathFuture.get().getPathsWithFileOriginals());
}
{code}
* Why are we using ugi.doAs? I have checked the other file related pool
implementations, and did not find any place where it was used.
* Usually it is a nightmare to synchronize guava between projects, so I prefer
to use it only when it is really useful. Lists.newArrayList() is deprecated
based on the docs
([https://guava.dev/releases/19.0/api/docs/com/google/common/collect/Lists.html#newArrayList(])).
Is there a specific purpose to use it here instead of the standard java new
ArrayList()?
* Maybe, if we were using lambdas for submitting the tasks we can get rid of
the ProcessForWriteIdsForMmReadCallable / MMPathInfo objects. What do you think?
* Also when we have output from the yetus run, please check the results of the
checkstyle/findbugs for any newly introduced warnings.
Thanks,
Peter
> MM table split computation is very slow
> ---------------------------------------
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
> Issue Type: Improvement
> Reporter: Aditya Shah
> Assignee: Aditya Shah
> Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we
> end up doing listing on the whole table at once. This could be optimized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)