Re: [PR] HBASE-29699 Scan#setLimit ignored in MapReduce jobs [hbase]

via GitHub Wed, 05 Nov 2025 17:09:01 -0800


junegunn commented on PR #7432:
URL: https://github.com/apache/hbase/pull/7432#issuecomment-3494354832


   I understand your point.
   
   > Not all users will look deeply into the javadoc
   
   True, and unfortunately, those users will still complain that 
`Scan#setLimit` doesn't work as expected no matter what. So we should:
   
   - Document the limitation of the method in the javadoc (_"this doesn't work 
with TableInputFormat"_),
   - And in case it's overlooked, print a warning message when serializing a 
Scan with a limit for an MR job.
   
   In order to do that, we need to introduce an internal version of 
`ProtobufUtil.toScan` that doesn't print a warning message and use it in 
`RequestConverter.buildScanRequest`. However, the public `ProtobufUtil.toScan` 
cannot tell if users are setting the new per-split-limit parameter in their 
configuration, or they are aware of the limitation; cases where the warning 
message can feel redundant. Users would then have to manually unset the limit 
to silence it, which is not ideal, so I'm not entirely sure about adding the 
warning.
   
   > introduce different meanings when using Scan limit 
   
   I thought it was acceptable, because we already have constructs that behave 
differently in parallel scenarios (e.g. stateful filters like `PageFilter` and 
`WhileMatchFilter`).
   
   > _This is because the filter is applied separately on different region 
servers._
   > 
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/filter/PageFilter.html
   
   So I assumed it was already well-understood that a separate Scan operates 
per split in such cases. But maybe that's just me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29699 Scan#setLimit ignored in MapReduce jobs [hbase]

Reply via email to