I'm not sure what you mean by "flat format" here.
In my scenario, I have an file input.xml that looks like this.
<myfile>
<section>
<value>1</value>
</section>
<section>
<value>2</value>
</section>
</myfile>
input.xml is a plain text file. Not a sequence file. If I read it with the
XMLInputFormat my mapper gets called with (key, value) pairs that look like
this:
(nnnn, <section><value>1</value></section>)
(nnnn, <section><value>2</value></section>)
Where the keys are numerical offsets into the file. I then use this
information to write a sequence file with these (key, value) pairs. So my
Hadoop job that uses XMLInputFormat takes a text file as input and produces
a sequence file as output.
I don't know a rule of thumb for how many small files is too many. Maybe
someone else on the list can chime in. I just know that when your
throughput gets slow that's one possible cause to investigate.