jichen20210919 commented on code in PR #79:
URL: https://github.com/apache/phoenix-connectors/pull/79#discussion_r873477213
##########
phoenix-hive-base/src/main/java/org/apache/phoenix/hive/mapreduce/PhoenixInputFormat.java:
##########
@@ -121,73 +126,192 @@ public InputSplit[] getSplits(JobConf jobConf, int
numSplits) throws IOException
private List<InputSplit> generateSplits(final JobConf jobConf, final
QueryPlan qplan,
final List<KeyRange> splits,
String query) throws
IOException {
- if (qplan == null){
+ if (qplan == null) {
throw new NullPointerException();
- }if (splits == null){
+ }
+ if (splits == null) {
throw new NullPointerException();
}
final List<InputSplit> psplits = new ArrayList<>(splits.size());
- Path[] tablePaths =
FileInputFormat.getInputPaths(ShimLoader.getHadoopShims()
- .newJobContext(new Job(jobConf)));
- boolean splitByStats =
jobConf.getBoolean(PhoenixStorageHandlerConstants.SPLIT_BY_STATS,
+ final Path[] tablePaths = FileInputFormat.getInputPaths(
+ ShimLoader.getHadoopShims().newJobContext(new Job(jobConf)));
+ final boolean splitByStats = jobConf.getBoolean(
+ PhoenixStorageHandlerConstants.SPLIT_BY_STATS,
false);
-
+ final int parallelThreshould = jobConf.getInt(
+ "hive.phoenix.split.parallel.threshold",
+ 32);
setScanCacheSize(jobConf);
+ if (
+ (parallelThreshould <= 0)
+ ||
+ (qplan.getScans().size() < parallelThreshould)
+ ) {
+ LOG.info("generate splits in serial");
+ for (final List<Scan> scans : qplan.getScans()) {
+ psplits.addAll(
+ generateSplitsInternal(
+ jobConf,
+ qplan,
+ splits,
+ query,
+ scans,
+ splitByStats,
+ tablePaths)
+ );
+ }
+ } else {
+ final int parallism = jobConf.getInt(
Review Comment:
parallelism level config is used to control the worker threads count for
parallel split method, parallel threshold is used to control which
split-generation method is used, serial or parallel.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]