Hi folks, I have a huge text file in TBs and it has multiline records. And we are not given that each records takes how many lines. One records can be of size 5 lines, other may be of 6 lines another may be 4 lines. Its not sure. Line size may vary for each record. Since we cannot use default TextInputFormat, we have written own inputformat and a custom record reader but the confusion is :
"When splits are happening, it is not sure if each split will contain the full record. Some part of record can go in split 1 and another in split 2." But this is not what we want. So, can anyone suggest how to handle this scenario so that we can guarantee that one full record goes in a single InputSplit ? Any work around or hint will be really useful. Thanks in advance. DR