Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1471#discussion_r151827453
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
---
@@ -687,16 +689,17 @@ protected Expression
getFilterPredicates(Configuration configuration) {
// get tokens for all the required FileSystem for table path
TokenCache.obtainTokensForNamenodes(job.getCredentials(),
new Path[] { new Path(absoluteTableIdentifier.getTablePath()) },
job.getConfiguration());
-
- TableDataMap blockletMap = DataMapStoreManager.getInstance()
- .getDataMap(absoluteTableIdentifier, BlockletDataMap.NAME,
- BlockletDataMapFactory.class.getName());
+ boolean distributedCG =
Boolean.parseBoolean(CarbonProperties.getInstance()
+ .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
+ CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
+ TableDataMap blockletMap =
+
DataMapStoreManager.getInstance().chooseDataMap(absoluteTableIdentifier);
DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
List<ExtendedBlocklet> prunedBlocklets;
- if (dataMapJob != null) {
+ if (distributedCG || blockletMap.getDataMapFactory().getDataMapType()
== DataMapType.FG) {
--- End diff --
It seems distributedCG and FG behave the same, right?
Earlier I though FG datamap will be called in ScanRDD.compute, but it seems
not?
If we collect all pruned blocklet in driver, will it be too many for
driver?
---