[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #119: Split files when planning scan tasks

GitBox Thu, 07 Mar 2019 13:56:48 -0800

rdblue commented on a change in pull request #119: Split files when planning 
scan tasks
URL: https://github.com/apache/incubator-iceberg/pull/119#discussion_r263585643


 ##########
 File path: 
spark/src/test/java/com/netflix/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -429,7 +429,9 @@ private File buildPartitionedTable(String desc, 
PartitionSpec spec, String udf,
     Table byId = TABLES.create(SCHEMA, spec, location.toString());
 
     // do not combine splits because the tests expect a split per partition
-    byId.updateProperties().set("read.split.target-size", "1").commit();
+    //TODO: this is commented out since the current patch will create
+    // too many small scan tasks
+    // byId.updateProperties().set("read.split.target-size", "1").commit();
 
 Review comment:
   This was set to avoid combining splits. Instead of setting it to 1, it works 
to set it to some value larger than the file size to avoid splitting. But the 
value can't be too large, or else splits get combined. Looks like 2048 works so 
that no files are split or combined.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #119: Split files when planning scan tasks

Reply via email to