Sorry if this question is common, I looked through docs, code and mail archives and did not find everything that answered these questions......

Say I have 3 files A, B & C, each file has a set of records I want to parse through, and the record location is already indexed the same across files, i.e. the second record in A maps to the second record in B, which maps to the second record in C. However, the record lengths in each file are different and thus the file size and block counts are different. I want to be able to sometimes read one, two, or all of the files depending on my needs for the job run. What I would like to happen is that all the records for each file end up on the same host so that it is always local access. So ideally the block sizes would be different for each file so that the first block for A has the same record count as the first block for B, etc. So my questions are:

1) I notice that on creating a file I can give a block size to the file, which would, if the records are fixed size, allow me to manually create equal record counts, but is this just a hint to the system? Will it be honored or could it use a different block size under certain conditions?

2) Even if I can get the proper record counts split across the files, is there a way to make sure that the corresponding blocks across files are located on the same node? If so, is there a way to prevent them from being split up if the system rebalances data blocks?

Thanks for any help....

Reply via email to