Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/968
  
    I'm not convinced that it's a good idea to back out the change to HBase 
specific changes made in DRILL-5546.
    
    You are right that project push-down in planner ideally should do its job 
and push the list of columns. However, until planner could claim there is no 
issues at all ( just like what happened prior to DRILL-5546), execution still 
may face a columns list with "*". That's why in HBaseGroupScan we have 
verifyColumnsAndConvertStar, just in case planner's project push-down did not 
work in the way we want.  If as you suggested, such conversion in 
HBaseGroupScan is redundant, why would it cause regression (if planner's rule 
works as expected)? or is it your intention to still keep "*" in 
HBaseRecordReader? If that's your intention, I think we are going in the wrong 
direction. HBase has a unified schema at table level.  
    
    I agree that the analysis of empty map {},  vs {a:varbinary}.  It's 
something we have to deal with. As a matter of fact, such scenarios does not 
have to come from empty batch. It could happen with two regions with >0 rows.  
    
    For instance, regrion 1 has 10 rows, with cf1.c1 appears in only first 5 
rows, while region 2 has 20 rows with cf1.c1 appears in every rows. For the 
following query:
    select CF1 FROM table where some_condition_on_row_key;
    
    if "some_condition_on_row_key" is pushed to hbase and prunes the first 5 
rows, region1 will return a batch with 5 rows, but with cf1 as an empty map, 
while region2 will have map with cf1 as {c1:varbinary}.  
    
    In that sense, DRILL-5546 exposes such issues, and force us to have a 
solution to handle empty map {} vs {a:varbinary}/



---

Reply via email to