Sorry my first email wasn't clearer (and had missing words). My question was, what is the maximum direct byte throughput of the underlying filesystem your reading against (when not cached). Let's call that the Optimal case. One way to do this might be to do a parallel hdfs fs -cat "file" > /dev/null.
The second question kernel, user and io wait time per workload. So we could get a snapshot something like this. | Reader | Transfer Rate | Kernel | User | IO | Drill 1.7 Other Solo Optimal If the specific kernel and user times are too difficult (mostly in the 1.7 and other cases probably), maybe just io wait and cpu load and total test duration for a fixed workload for each would suffice? Even if this isn't possible, that's lots of great stuff in what you put together. Was just trying to understand the bounding box. thanks, Jacques -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Jul 25, 2016 at 3:17 PM, Parth Chandra <[email protected]> wrote: > Didn't quite catch your question there. But I do have the following numbers > from the file system - > > | AvgIOR OpSize (KB) | Estimated > Ops/Disk > Drill 1.7.0 - uncached | 239 | > 103 > Solo Uncached | 240 | > 281 > > The numbers are approximate as these are captured by scripts on all the > nodes and then averaged by another script. > > Solo is close to as fast as is possible from disk. > > Is that what you were looking for? >
