Hi Chuck.
Thanks a lot for your answer.
> Depending on your application that's either no big deal or a deal
breaker.
Ups (big UPS!). This is really bad news for me.
I see only two solutions: either to pre-process my files and put
everything on a single row (so the boundary is not a problem anymore),
either to switch to Java and make a RecordReader as you said in order to
properly read the records. But in this last case I suppose I cannot send
the records to by EXE file anymore (through streaming). Right?
Mind to share some background on your application?
Well, this is just a beginning. If this isn't going to work, then there is not
much to tell :(
We need to build a new sequence processing tool that is suppose to replace and old tool (which can't handle large amounts of data anymore). For the beginning we want to see if Hadoop can be used to run old biology tools in parallel to speed up the whole process. More exactly, we want to replace cluster management software (like Sun Grid Engine) with Hadoop. Later we were suppose to add additional features to post-process the data generated by those biology tools.
As I said, all these were only plans. We will see what will happen now. If it
works, I will be return to post a link to what we have archived.
Gabriel