ChinmaySKulkarni commented on a change in pull request #4: PHOENIX-5238 Provide
an option to pass hints with PhoenixRDD and Data…
URL: https://github.com/apache/phoenix-connectors/pull/4#discussion_r279934637
##########
File path:
phoenix-spark/src/main/java/org/apache/phoenix/spark/datasource/v2/reader/PhoenixDataSourceReader.java
##########
@@ -148,6 +150,9 @@ public StructType readSchema() {
// Optimize the query plan so that we potentially use secondary
indexes
final QueryPlan queryPlan = pstmt.optimizeQuery(selectStatement);
final Scan scan = queryPlan.getContext().getScan();
+ if (this.disableBlockCache) {
+ scan.setCacheBlocks(false);
Review comment:
The `scan` variable is unused. You can actually remove it. You should be
setting this on each scan in the queryPlan, otherwise the Spark executor scans
will not have this hint set. Instead of iterating over each scan here, it may
be easier to set this in `PhoenixDataSourceReadOptions`. We create an instance
of this when we call `PhoenixDataSourceReader#planInputPartitions()` from the
driver. Also, these are embedded in each of our InputPartitions, so the read
options are available to us on the Spark executors (see
`PhoenixInputPartitionReader#initialize()`). Here we are iterating over the
scans and you can use the set value in the read options to `setCacheBlocks` to
false.
Also, in case this hint is provided, you should make sure any other scan
objects used on the driver also has this property set for example, the scan
that we use on the driver-side to get the region locations.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services