Re: PCAP file format support

william kinney Wed, 29 Jul 2009 21:22:27 -0700

+1

In general I think you would just need to parse the interesting fields
via a java pcap format reader (or do the byte reading yourself, the
format is pretty standard:
http://wiki.wireshark.org/Development/LibpcapFileFormat), put them
into a Writeable object and write them to the HDFS via SequenceFile
format.


Another option is using a binary serialization package such as avro,
thrift or protobuf and writing the serialized form to the HDFS. You
would then need to write your own InputFormat/RecordReader for it, or
wait for http://issues.apache.org/jira/browse/MAPREDUCE-377 or some
other native support.

Will

On Wed, Jul 29, 2009 at 7:21 PM, Ariel Rabkin<[email protected]> wrote:
> I remember looking at this some months back.
>
> My recollection is that PCAP is a somewhat awkward format to
> MapReduce, since it isn't splittable -- you can't find record
> boundaries, if you start at a random offset.
>
> You may want to do some sort of preprocessing, before you upload your
> logs to HDFS to fix this.  Irritatingly, the existing code I've seen
> for processing PCAP files doesn't seem very friendly to parsing
> arbitrary packet-trace data in-memory.
>
> --Ari
>
> On Tue, Jul 28, 2009 at 8:31 AM, Wasim Bari<[email protected]> wrote:
>>
>>
>>
>>
>>
>> Hi,
>>
>>   I have data in PCAP file format (packet capture for network trafficc). Is 
>> it possible to process this file in Hadoop in same format ? Or any 
>> supporting tool over hadoop to analyze data from PCAP files ?
>>
>>
>>
>>
>>
>> Bye
>>
>>
>>
>> Wasim
>>
>
>
>
> --
> Ari Rabkin [email protected]
> UC Berkeley Computer Science Department
>

Re: PCAP file format support

Reply via email to