This is the most important thing that you have said. The map function is called once per unit of input but the mapper object persists for many input units of input.
You have a little bit of control over how many mapper objects there are and how many machines they are created on and how many pieces your input is broken into. That control is limited, however, unless you build your own input format. The standard input formats are optimized for very large inputs and may not give you the flexibility that you want for your experiments. That is unfortunate for the purpose of learning about hadoop but hadoop is designed mostly for dealing with very large data and isn't usually designed to be easy to understand. Where easy coincides with powerful then easy is good but powerful isn't always easy. On Sunday, February 20, 2011, maha <[email protected]> wrote: > So first question: is there a difference between Mappers and maps ?
