Yes, you are right. It should work automatically after annotation is added to his demux parser.
regards, Eric On Sat, Oct 23, 2010 at 1:27 PM, Corbin Hoenes <[email protected]> wrote: > +1 > > I imagine it is jst another pipelinable class loaded into the collector? If > so bill's scenario would work. > > Sent from my iPhone > > On Oct 23, 2010, at 12:59 PM, Bill Graham <[email protected]> wrote: > >> Eric, I'm also curious about how the HBase integration works. Do you >> have time to write something up on it? I'm interested in the >> possibility of extending what's there to write my own custom data into >> HBase from a collector, while said data also continues through to HDFS >> as it does currently. >> >> >> On Fri, Oct 22, 2010 at 5:21 PM, Corbin Hoenes <[email protected]> >> wrote: >>> >>> Eric in chukwa 0.5 is hbase the final store instead of hdfs? What format >>> will the hbase data be in (e.g. A chukwarecord object ? Something user >>> configurable? ) >>> >>> Sent from my iPhone >>> >>> On Oct 22, 2010, at 8:48 AM, Eric Yang <[email protected]> wrote: >>> >>>> Hi Matt, >>>> >>>> This is expected in Chukwa archives. When agent is unable to post to >>>> the collector, it will retry to post the same data again to another >>>> collector or retrys with the same collector when no other collector is >>>> available. Collector may have data written without proper acknowledge >>>> back to agent in high load situation. Chukwa philosophy is to retry >>>> until receiving acknowledgement. Duplicated data filter will be >>>> treated after data has been received. >>>> >>>> The duplication filtering in Chukwa 0.3.0 depends on data loading to >>>> mysql. The same primary key will update to the same row to remove >>>> duplicates. It is possible to build a duplication detection process >>>> prior to demux which filter data based on sequence id + data type + >>>> csource (host), but this hasn't been implemented because primary key >>>> update method works well for my use case. >>>> >>>> In Chukwa 0.5, we are treating duplication the same as in Chukwa 0.3, >>>> where it will replace any duplicated row in HBase base on Timestamp + >>>> HBase row key. >>>> >>>> regards, >>>> Eric >>>> >>>> On Thu, Oct 21, 2010 at 8:22 PM, Matt Davies <[email protected]> >>>> wrote: >>>>> >>>>> Hey everyone, >>>>> >>>>> I have a situation where I'm seeing duplicated data downstream before >>>>> the >>>>> demux process. It appears this happens during high system loads and we >>>>> are >>>>> still using the 0.3.0 series. >>>>> >>>>> So, we have validated that there is a single, unique entry in our >>>>> source >>>>> file which then shows up a random amount of times before we see it in >>>>> demux. >>>>> So, it appears that there is duplication happening somewhere between >>>>> the >>>>> agent and collector. >>>>> >>>>> Has anyone else seen this? Any ideas as to why we are seeing this >>>>> during >>>>> high system loads, but not during lower loads. >>>>> >>>>> TIA, >>>>> Matt >>>>> >>>>> >>> >
