Don't have the numbers for kernel and user times but let me see if I can dig out the other numbers. Kunal did a bunch of the work (I've updated the doc to reflect his contribution :) ). The purpose of Solo was, in fact, to establish how fast I could drive the file system after eliminating the decoding and decompression of data but reading a Parquet file as we would want to in Drill. If we have the numbers I'll add them to the doc.
On Tue, Jul 26, 2016 at 5:11 PM, Jacques Nadeau <[email protected]> wrote: > Sorry my first email wasn't clearer (and had missing words). > > My question was, what is the maximum direct byte throughput of the > underlying filesystem your reading against (when not cached). Let's call > that the Optimal case. One way to do this might be to do a parallel hdfs fs > -cat "file" > /dev/null. > > The second question kernel, user and io wait time per workload. So we could > get a snapshot something like this. > > | Reader | Transfer Rate | Kernel | User | IO | > Drill 1.7 > Other > Solo > Optimal > > If the specific kernel and user times are too difficult (mostly in the 1.7 > and other cases probably), maybe just io wait and cpu load and total test > duration for a fixed workload for each would suffice? > > Even if this isn't possible, that's lots of great stuff in what you put > together. Was just trying to understand the bounding box. > > thanks, > Jacques > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Jul 25, 2016 at 3:17 PM, Parth Chandra <[email protected]> > wrote: > > > Didn't quite catch your question there. But I do have the following > numbers > > from the file system - > > > > | AvgIOR OpSize (KB) | Estimated > > Ops/Disk > > Drill 1.7.0 - uncached | 239 | > > 103 > > Solo Uncached | 240 | > > 281 > > > > The numbers are approximate as these are captured by scripts on all the > > nodes and then averaged by another script. > > > > Solo is close to as fast as is possible from disk. > > > > Is that what you were looking for? > > >
