> On May 12, 2017, 2:18 p.m., Aihua Xu wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 3357-3361 (patched)
> > <https://reviews.apache.org/r/58936/diff/3/?file=1714847#file1714847line3357>
> >
> >     Vihang and Sahil,
> >     
> >     Typically what would cause the batch to fail? Is that because the batch 
> > could be too large? 
> >     
> >     Right now, we are hard coding decayingFactor to 2. I have another 
> > thought: maybe with the retries, we will  calculate such decayingFactor so 
> > the last retry will always process one partition at a time just like what 
> > we are doing. So given batch size 100 and retries 4, 100, 66, 33, 1? 
> >     
> >     How do you think?

The batch could fail when the network is flaky or if the processing time of the 
batch is higher than socket timeout value of metastore client. This could be 
more common  in cloud based datastores like S3. I think what you are proposing 
is a linearly decaying batchsize which may work fine for smaller batch sizes 
but may not converge very fast if the batch size is (mis)configured to be much 
higher or at default value of 0. Eg. consider numPartitions = 10,000 and 
maxRetries = 10 so batch sizes with your approach will be 10k, 9k, 8k, 7k.. 
which all may be too high. If we decay exponentially the batches will be 10k, 
5k, 2.5k, 1.25k.. which is more likely to succeed.


- Vihang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58936/#review174792
-----------------------------------------------------------


On May 12, 2017, 9:35 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58936/
> -----------------------------------------------------------
> 
> (Updated May 12, 2017, 9:35 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Sergio Pena, and Sahil Takiar.
> 
> 
> Bugs: HIVE-16143
>     https://issues.apache.org/jira/browse/HIVE-16143
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16143 : Improve msck repair batching
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> d3ea824c21f2fbf98177cb12a18019416f36a3f9 
>   common/src/java/org/apache/hive/common/util/RetryUtilities.java 
> PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestRetryUtilities.java 
> PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/create_like.q 
> 38f384e4c547d3c93d510b89fccfbc2b8e2cba09 
>   itests/hive-blobstore/src/test/results/clientpositive/create_like.q.out 
> 0d362a716291637404a3859fe81068594d82c9e0 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 2ae1eacb68cef6990ae3f2050af0bed7c8e9843f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckCreatePartitionsInBatches.java
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/msck_repair_0.q 
> 22542331621ca4ce5277c2f46a4264b7540a4d1e 
>   ql/src/test/queries/clientpositive/msck_repair_1.q 
> ea596cbbd2d4c230f2b5afbe379fc1e8836b6fbd 
>   ql/src/test/queries/clientpositive/msck_repair_2.q 
> d8338211e970ebac68a7471ee0960ccf2d51cba3 
>   ql/src/test/queries/clientpositive/msck_repair_3.q 
> fdefca121a2de361dbd19e7ef34fb220e1733ed2 
>   ql/src/test/queries/clientpositive/msck_repair_batchsize.q 
> e56e97ac36a6544f3e20478fdb0e8fa783a857ef 
>   ql/src/test/results/clientpositive/msck_repair_0.q.out 
> 2e0d9dc423071ebbd9a55606f196cf7752e27b1a 
>   ql/src/test/results/clientpositive/msck_repair_1.q.out 
> 3f2fe75b194f1248bd5c073dd7db6b71b2ffc2ba 
>   ql/src/test/results/clientpositive/msck_repair_2.q.out 
> 3f2fe75b194f1248bd5c073dd7db6b71b2ffc2ba 
>   ql/src/test/results/clientpositive/msck_repair_3.q.out 
> 3f2fe75b194f1248bd5c073dd7db6b71b2ffc2ba 
>   ql/src/test/results/clientpositive/msck_repair_batchsize.q.out 
> ba99024163a1f2c59d59e9ed7ea276c154c99d24 
>   ql/src/test/results/clientpositive/repair.q.out 
> c1834640a35500c521a904a115a718c94546df10 
> 
> 
> Diff: https://reviews.apache.org/r/58936/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>

Reply via email to