Re: Seeing duplicate entries

Eric Yang Sat, 23 Oct 2010 20:34:47 -0700

Yes, you are right.  It should work automatically after annotation is
added to his demux parser.


regards,
Eric

On Sat, Oct 23, 2010 at 1:27 PM, Corbin Hoenes <[email protected]> wrote:
> +1
>
> I imagine it is jst another pipelinable class loaded into the collector?  If
> so bill's scenario would work.
>
> Sent from my iPhone
>
> On Oct 23, 2010, at 12:59 PM, Bill Graham <[email protected]> wrote:
>
>> Eric, I'm also curious about how the HBase integration works. Do you
>> have time to write something up on it? I'm interested in the
>> possibility of extending what's there to write my own custom data into
>> HBase from a collector, while said data also continues through to HDFS
>> as it does currently.
>>
>>
>> On Fri, Oct 22, 2010 at 5:21 PM, Corbin Hoenes <[email protected]>
>> wrote:
>>>
>>> Eric in chukwa 0.5 is hbase the final store instead of hdfs?  What format
>>> will the hbase data be in (e.g. A chukwarecord object ? Something user
>>> configurable? )
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 22, 2010, at 8:48 AM, Eric Yang <[email protected]> wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> This is expected in Chukwa archives.  When agent is unable to post to
>>>> the collector, it will retry to post the same data again to another
>>>> collector or retrys with the same collector when no other collector is
>>>> available.  Collector may have data written without proper acknowledge
>>>> back to agent in high load situation.  Chukwa philosophy is to retry
>>>> until receiving acknowledgement.  Duplicated data filter will be
>>>> treated after data has been received.
>>>>
>>>> The duplication filtering in Chukwa 0.3.0 depends on data loading to
>>>> mysql.  The same primary key will update to the same row to remove
>>>> duplicates.  It is possible to build a duplication detection process
>>>> prior to demux which filter data based on sequence id + data type +
>>>> csource (host), but this hasn't been implemented because primary key
>>>> update method works well for my use case.
>>>>
>>>> In Chukwa 0.5, we are treating duplication the same as in Chukwa 0.3,
>>>> where it will replace any duplicated row in HBase base on Timestamp +
>>>> HBase row key.
>>>>
>>>> regards,
>>>> Eric
>>>>
>>>> On Thu, Oct 21, 2010 at 8:22 PM, Matt Davies <[email protected]>
>>>> wrote:
>>>>>
>>>>> Hey everyone,
>>>>>
>>>>> I have a situation where I'm seeing duplicated data downstream before
>>>>> the
>>>>> demux process. It appears this happens during high system loads and we
>>>>> are
>>>>> still using the 0.3.0 series.
>>>>>
>>>>> So, we have validated that there is a single, unique entry in our
>>>>> source
>>>>> file which then shows up a random amount of times before we see it in
>>>>> demux.
>>>>> So, it appears that there is duplication happening somewhere between
>>>>> the
>>>>> agent and collector.
>>>>>
>>>>> Has anyone else seen this? Any ideas as to why we are seeing this
>>>>> during
>>>>> high system loads, but not during lower loads.
>>>>>
>>>>> TIA,
>>>>> Matt
>>>>>
>>>>>
>>>
>

Re: Seeing duplicate entries

Reply via email to