[jira] [Commented] (HBASE-8571) CopyTable and RowCounter don't seem to use setCaching setting

Doug Meil (JIRA) Tue, 21 May 2013 07:51:20 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663034#comment-13663034
 ]


Doug Meil commented on HBASE-8571:
----------------------------------

I realized that it does actually handle setCaching, but my comment was more 
about the implementation.  The Scan is explicitly set in the job with every 
option *except* setCaching (e.g., startRow/stopRow, columns, cacheBlocks, 
etc.).  But the caching is magic because it gets picked up way down the line 
via the Config instance in ClientScanner.  It's just a little confusing, that's 
all.



                
> CopyTable and RowCounter don't seem to use setCaching setting
> -------------------------------------------------------------
>
>                 Key: HBASE-8571
>                 URL: https://issues.apache.org/jira/browse/HBASE-8571
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Doug Meil
>
> Maybe it's just me, but I've been looking on trunk and I don't see where 
> either RowCounter or CopyTable MapReduce can adjust the setCaching setting on 
> the Scan instance.
> Example from RowCounter...
> {code}
>    Job job = new Job(conf, NAME + "_" + tableName);
>     job.setJarByClass(RowCounter.class);
>     Scan scan = new Scan();
>     scan.setCacheBlocks(false);
>     Set<byte []> qualifiers = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
>     if (startKey != null && !startKey.equals("")) {
>       scan.setStartRow(Bytes.toBytes(startKey));
>     }
>     if (endKey != null && !endKey.equals("")) {
>       scan.setStopRow(Bytes.toBytes(endKey));
>     }
>     scan.setFilter(new FirstKeyOnlyFilter());
>     if (sb.length() > 0) {
>       for (String columnName : sb.toString().trim().split(" ")) {
>         String [] fields = columnName.split(":");
>         if(fields.length == 1) {
>           scan.addFamily(Bytes.toBytes(fields[0]));
>         } else {
>           byte[] qualifier = Bytes.toBytes(fields[1]);
>           qualifiers.add(qualifier);
>           scan.addColumn(Bytes.toBytes(fields[0]), qualifier);
>         }
>       }
>     }
>     // specified column may or may not be part of first key value for the row.
>     // Hence do not use FirstKeyOnlyFilter if scan has columns, instead use
>     // FirstKeyValueMatchingQualifiersFilter.
>     if (qualifiers.size() == 0) {
>       scan.setFilter(new FirstKeyOnlyFilter());
>     } else {
>       scan.setFilter(new FirstKeyValueMatchingQualifiersFilter(qualifiers));
>     }
>     job.setOutputFormatClass(NullOutputFormat.class);
>     TableMapReduceUtil.initTableMapperJob(tableName, scan,
>       RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, 
> job);
>     job.setNumReduceTasks(0);
>     return job;
> {code}
> TableMapReduceUtil only serializes the Scan into the job, it doesn't adjust 
> any of the settings.
> Maybe I'm missing something, but this seems like a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8571) CopyTable and RowCounter don't seem to use setCaching setting

Reply via email to