Correction, the HDFS has been written to HDFS correctly. Data were stuck at post data processing because the postProcess program crashed. I still need to determine the cause of postProcess crash. I think the modified SeqFileWriter does what I wanted, and I will implement next.add() to ensure the ordering can be interchanged.
Regards, Eric On 12/18/09 8:59 AM, "Eric Yang" <[email protected]> wrote: > I like to make a T on the incoming data. One writer goes into HDFS, and > another writer enable real time pub/sub to monitor the data. In my case, > the data are mirrored, not filtered. However, I am not getting the right > result because it seems the data isn't getting written into HDFS regardless > the ordering of the writer. > > Regards, > Eric > > On 12/17/09 9:53 PM, "Ariel Rabkin" <[email protected]> wrote: > >> What's the use case for this? >> >> The original motivation for pipelined writers was so that we could do >> things like filtering before data got written. Then it occurred to me >> that SocketTeeWriter fit fairly naturally into a pipeline. >> >> Putting it "after" seq file writer wouldn't be too bad -- >> SeqFileWriter.add() would need to call next.add(). But I would be >> hesitant to commit that change, without a really clear use case. >> >> --Ari >> >> On Thu, Dec 17, 2009 at 8:39 PM, Eric Yang <[email protected]> wrote: >>> It works fine after I put SocketTeeWriter first. What needs to be >>> implemented in SeqFileWriter to be able to pipe correctly? >>> >>> Regards, >>> Eric >>> >>> On 12/17/09 5:26 PM, "[email protected]" <[email protected]> wrote: >>> >>>> Put the SocketTeeWriter first. >>>> >>>> sent from my iPhone; please excuse typos and brevity. >>>> >>>> On Dec 17, 2009, at 8:12 PM, Eric Yang <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'd setup SocketTeeWriter by itself, and having data stream to next >>>>> socket >>>>> reader program. When I tried to configure two writers, i.e., >>>>> SeqFileWriter >>>>> follow by SocketTeeWriter. It doesn't work because SeqFileWriter >>>>> isn't >>>>> extending PipelineableWriter. I went ahead to extend SeqFileWriter as >>>>> PipelineableWriter and do that and implemented setNextStage method, >>>>> and >>>>> configured collector with: >>>>> >>>>> <property> >>>>> <name>chukwaCollector.writerClass</name> >>>>> >>>>> <value> >>>>> org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter</v >>>>> alue> >>>>> </property> >>>>> >>>>> <property> >>>>> <name>chukwaCollector.pipeline</name> >>>>> >>>>> <value> >>>>> org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter,org.apac >>>>> he.hadoop.chukwa.datacollection.writer.SocketTeeWriter</value> >>>>> </property> >>>>> >>>>> SeqFileWriter writes the data correctly, but when connect to >>>>> SocketTeeWriter, there was no data visible in SocketTeeWriter. >>>>> Commands >>>>> works fine, but data streaming doesn't happen. How do I configure the >>>>> collector and PipelineStageWriter to be able to write data into >>>>> multiple >>>>> writer? Is there something on SeqFileWriter that could prevent this >>>>> from >>>>> working? >>>>> >>>>> Regards, >>>>> Eric >>>>> >>> >>> >> >> >
