Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695690
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java
 ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are 
not evenly
    -        // distributed across regions, using this scheme compensates for 
regions that
    -        // have more rows than others, by applying tighter splits and 
therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll 
use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = 
ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = 
ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    How about something like this:
    
    * If there's a single region, we don't try to divide it
    * When a table splits, we initiate an update stats call on it - can we 
capture that in a coprocessor? If not, perhaps we can infer that it happened 
based on the current stats versus the number of regions.
    * We have the concept of a "minimum time to recalc stats" as a separate 
config and store the time when the stats were last calculated in the stats 
table. This will prevent too many analyze stats from being called (i.e. it 
wouldn't be dangerous to invoke an analyze stats here on the client, even if 
many clients do it, as the ones after the first one would be a noop).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to