Jeff Zhang wrote:
Mridul,
What do you mean about "Counter's are not synchronized in 'real-time' " ?
As I know, JT will aggregate Counters from TT, so I think the aggregated
Counter in JT should be correct.
Aggregate counters are guaranteed to be correct at end of a logical
state - not necessarily in between.
Consider cases of mapper/reducer task re-execution, caching at the task
nodes (counters piggyback on heartbeat - and so every XX seconds), etc.
So trying to limit output based on counter would typically result in not
optimal results.
Regards,
Mridul
On Tue, Jan 26, 2010 at 3:08 PM, Mridul Muralidharan
<[email protected]>wrote:
Jeff Zhang wrote:
*See my comments below*
On Mon, Jan 25, 2010 at 3:22 PM, Something Something <
[email protected]> wrote:
If I set # of reduce tasks to 1 using setNumReduceTasks(1), would the
class
be instantiated only on one machine.. always? I mean if I have a cluster
of
say 1 master, 10 workers & 3 zookeepers, is the Reducer class guaranteed
to
be instantiated only on 1 machine?
*--Yes*
If answer is yes, then I will use static variable as a counter to see how
may rows have been added to my HBase table so far. In my use case, I
want
to write only N number of rows to a table. Is there a better way to do
this? Please let me know. Thanks.
*--Maybe you can use Counter to track the number of rows you add to HBase,
then you do not need to limit the reduce task as 1*
Counter's are not synchronized in 'real-time' : so you cant use that to
limit at addition time imo.
It is more for aggregation, not realtime messaging.
- Mridul