It sounds like your source is in avro? Or you want to transform your
logs to avro?
> Or would I use the raw output format, logging serialized AVRO data in =
> the message body and analyze it later in Hadoop?
I don't see any problem.
> Are there any problems with this? I could imagine that this won't work =
> because hadoop is splitting after 64 mb?
hdfs block size should be transparent to users. You wouldn't be aware of
it at all. If you write avro to hdfs, I can imagine that later on you
will parse the avro file(s) with a map/reduce job and do whatever you
want. I don't see why you need to bother for the 64mb block size. Or i
missed anything?
Is the link helpful?
http://www.datasalt.com/blog/2011/07/hadoop-avro/
On 10/25/2011 09:02 AM, Tobias Schlottke wrote:
Hi there,
sorry for the newbie question.
I really want to write Logging data in a custom AVRO schema.
Is it possible to extend the standard schema?
Or would I use the raw output format, logging serialized AVRO data in =
the message body and analyze it later in Hadoop?
Are there any problems with this? I could imagine that this won't work =
because hadoop is splitting after 64 mb?
Do we have to implement a custom source?
What is the most elegant solution for this?
Best,
Tobias