Re: help with map-reduce

check_writer Tue, 07 Apr 2009 22:05:46 -0700

I have this EXACT same problem and I thought it was just me.   For some
reason my eclipse just hangs as I try to extend the PageRowFilter like the
following:


Scanner scanner = table.getScanner(new String[] { colfam1 + "nodeid" },
"999-1", 2280278, new PageRowFilter(1)
                {
                        public boolean filterColumn(byte[] rowKey, byte[] 
colKey, byte[] data)
                        {
                                return true;
                        }
                });

I had to type it by hand.  Also When I run this  the small program,  HBase
ends up giving me this error

--------------------------
Exception in thread "main"
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server 127.0.0.1:49847 for region mytable,,1239130171356, row
'999-1', but failed after 10 attempts.
Exceptions:
java.io.IOException: Call to /127.0.0.1:49847 failed on local exception:
java.io.EOFException
java.io.IOException: Call to /127.0.0.1:49847 failed on local exception:
java.io.EOFException

        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:858)
        at
org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1594)
        at
org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1539)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:862)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:838)
        at mytest.HbastTester.main(HbastTester.java:98)
-----------------------------------
My region server is active at port 60020 and the info is on 60030, and I'm
running HBase in local mode.    But other simpler-nonfilter based scanners
work just fine.

Anyway clues would be helpful.
thanks!
check_writer.



Rakhi Khatwani wrote:
> 
> Hi,
>            I did try the filter... but using ColumnValueFilter. i declared
> a
> ColumnValueFilter as follows:
> 
> public class TableInputFilter extends TableInputFormat
>     implements JobConfigurable {
> 
>              public void configure(final JobConf jobConf) {
> 
>             setHtable(tablename);
> 
>             setInputColumns(columnName);
> 
> 
>              final RowFilterInterface colFilter =
>                                                  new
> ColumnValueFilter("Status:".getBytes(), ColumnValueFilter.CompareOp.EQUAL,
> "UNCOLLECTED".getBytes());
>                setRowFilter(colFilter);
>    }
> 
> }
> 
> and thn i use my class as the input format to my map function.
> 
> 
> in my map function, i set my log to display the value of my Status Column
> family.
> 
> when i execute my map reduce function, it displays "Status:: Uncollected"
> for some rows
> and Status = "Collected" for rest of the rows.
> 
> but what i want is to send only those records whose 'Status: is
> uncollected'.
> 
> i even considered using the method filterRow described by the API as
> follows:
>   boolean
> *filterRow<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/filter/ColumnValueFilter.html#filterRow%28java.util.SortedMap%29>
> *(SortedMap<http://java.sun.com/javase/6/docs/api/java/util/SortedMap.html?is-external=true>
> <byte[],Cell<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/Cell.html>
>> columns)
>           Filter on the fully assembled row.
> 
> but as soon as i type colFilter followed by a '.', my eclipse hangs.
> its really weird... i have tried it on 3 different machines (2 machines on
> linux running eclipse gannymade 3.4 and one on windows using myEclipse).
> 
> 
> i dunno if i am going wrong somewhere
> 
> Thanks,
> Raakhi
> 
> 
> On Tue, Apr 7, 2009 at 7:18 PM, Lars George <[email protected]> wrote:
> 
>> Hi Rakhi,
>>
>> The way the filters work is that you either use the supplied filters or
>> create your own subclasses - but then you will have to deploy that class
>> to
>> all RegionServers while adding it to their respective hbase-env.sh (in
>> the
>> "export HBASE_CLASSPATH" variable). We are discussing currently if this
>> could be done dynamically (
>> https://issues.apache.org/jira/browse/HBASE-1288).
>>
>> Once you have that done or use one of the supplied one then you can
>> assign
>> the filter by overriding the TableInputFormat's configure() method and
>> assign it like so:
>>
>>  public void configure(JobConf job) {
>>     RegExpRowFilter filter = new RegExpRowFilter("ABC.*");
>>     setRowFilter(filter);
>>  }
>>
>> As Tim points out, setting the whole thing up is done in your main M/R
>> tool
>> based application, similar to:
>>
>>  JobConf job = new JobConf(...);
>>  TableMapReduceUtil.initTableMapJob("<table-name>", "<colums>",
>> IdentityTableMap.class,
>>   ImmutableBytesWritable.class, RowResult.class, job);
>>  job.setReducerClass(MyTableReduce.class);
>>  job.setInputFormat(MyTableInputFormat.class);
>>  job.setOutputFormat(MyTableOutputFormat.class);
>>
>> Of course depending on what classes you want to replace or if this is a
>> Reduce oriented job (means a default identity + filter map and all the
>> work
>> done in the Reduce phase) or the other way around. But the principles and
>> filtering are the same.
>>
>> HTH,
>> Lars
>>
>>
>>
>> Rakhi Khatwani wrote:
>>
>>> Thanks Ryan, i will try that
>>>
>>> On Tue, Apr 7, 2009 at 3:05 PM, Ryan Rawson <[email protected]> wrote:
>>>
>>>
>>>
>>>> there is a server-side mechanism to filter rows, it's found in the
>>>> org.apache.hadoop.hbase.filter package.  im not sure how this interops
>>>> with
>>>> the TableInputFormat exactly.
>>>>
>>>> setting a filter to reduce the # of rows returned is pretty much
>>>> exactly
>>>> what you want.
>>>>
>>>> On Tue, Apr 7, 2009 at 2:26 AM, Rakhi Khatwani
>>>> <[email protected]
>>>>
>>>>
>>>>> wrote:
>>>>>      Hi,
>>>>>    i have a map reduce program with which i read from a hbase table.
>>>>> In my map program i check if the column value of a is xxx, if yes then
>>>>> continue with processing else skip it.
>>>>> however if my table is really big, most of my time in the map gets
>>>>> wasted
>>>>> for processing unwanted rows.
>>>>> is there any way through which we could send a subset of rows (based
>>>>> on
>>>>>
>>>>>
>>>> the
>>>>
>>>>
>>>>> value of a particular column family) to the map???
>>>>>
>>>>> i have also gone through TableInputFormatBase but am not able to
>>>>> figure
>>>>>
>>>>>
>>>> out
>>>>
>>>>
>>>>> how do we set the input format if we are using TableMapReduceUtil
>>>>> class
>>>>>
>>>>>
>>>> to
>>>>
>>>>
>>>>> initialize table map jobs. or is there any other way i could use it.
>>>>>
>>>>> Thanks in Advance,
>>>>> Raakhi.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/help-with-map-reduce-tp22925481p22943183.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: help with map-reduce

Reply via email to