Hi, The partition count depends on following factors: 1. How big is your input data 2, Your cluster size 3. Desired speed
If you choose partition count too high and you have only couple of files to process the partitions (i.e. containers) will live idle after processing input files. And if it's too small your processing will be slow. If you can predict your input traffic in advance, you can decide on partition count in advance. And as input keeps varying, we have dynamic partitioning. Where partitions will increase or decrease based on input volume. Once the input is processed the partitions will be removed and dag will shrink till more data is available for processing. FileSplitter just splits the file metadata, the BlockReader actually reads the blocks. And in your application BlockReader will have partitions as it does the work of reading data. -Priyanka On Wed, Oct 14, 2015 at 6:55 PM, Chiru <[email protected]> wrote: > > What are all parameters need to cosider to set the partitin count? Like > can we give any random number or bases on cluster size or file size or > block size? > > please brief on partition count setting? how it process the file/block? > > Thanks-chiru > > > <property> > > <name>dt.application.<appName>.operator.<operatorName>.attr.PARTITIONER</name> > <value>com.datatorrent.common.partitioner.StatelessPartitioner:1</value> > </property> > > > > > On Friday, 9 October 2015 18:33:06 UTC+5:30, Chiru wrote: > >> Hi All, >> >> How i can find the entire file read when using the FileSplitter.I have to >> wait till the EOF then start processing. >> >> Please share sample code if possible. >> >> Thanks -Chiru >> > -- > You received this message because you are subscribed to the Google Groups > "Malhar" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/malhar-users. > For more options, visit https://groups.google.com/d/optout. >
