I remember looking at this some months back. My recollection is that PCAP is a somewhat awkward format to MapReduce, since it isn't splittable -- you can't find record boundaries, if you start at a random offset.
You may want to do some sort of preprocessing, before you upload your logs to HDFS to fix this. Irritatingly, the existing code I've seen for processing PCAP files doesn't seem very friendly to parsing arbitrary packet-trace data in-memory. --Ari On Tue, Jul 28, 2009 at 8:31 AM, Wasim Bari<[email protected]> wrote: > > > > > > Hi, > > I have data in PCAP file format (packet capture for network trafficc). Is > it possible to process this file in Hadoop in same format ? Or any supporting > tool over hadoop to analyze data from PCAP files ? > > > > > > Bye > > > > Wasim > -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
