[ https://issues.apache.org/jira/browse/HBASE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726607#action_12726607 ]
Chris K Wensel commented on HBASE-1605: --------------------------------------- Good questions. In SQL, LIMIT returns the first N rows of the result set. and is typically used with OFFSET to allow pagination. In Cascading, the Limit Operation only allows each task to see N/M rows (accounting for remainders). no notion of OFFSET as limit in this case is really used for unit/integration testing or sampling. re HBase, you guys should choose a model that makes most sense for typical hbase consumer applications. but allowing for an even load across many mappers, but orthogonally limiting the total number of rows processed is what I'm after. having this work with a Filter would also be very nice. i.e. give me the 1k rows that satisfy this condition. but I guess if i want the first 1k rows that satisfy the filter, we might be limited to a single region (and single mapper as I see the code now). so maybe there are two modes. sample and result. sample returns 'random' N rows (top N/M from regions). result turns ordered N rows (from a region by virtue). anyways, just throwing that out there. current use case would be happy with either. though 'result' is probably the most useful coupled with HBASE-1172. > TableInputFormat should support 'limit' > --------------------------------------- > > Key: HBASE-1605 > URL: https://issues.apache.org/jira/browse/HBASE-1605 > Project: Hadoop HBase > Issue Type: Improvement > Components: mapred > Reporter: Chris K Wensel > > Would be useful if TableInputFormat could be passed a 'limit' property value > that limited the total result set to the value of 'limit'. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.