Re: [jira] [Commented] (DRILL-4800) Improve parquet reader performance

Jacques Nadeau Tue, 26 Jul 2016 17:12:06 -0700

Sorry my first email wasn't clearer (and had missing words).

My question was, what is the maximum direct byte throughput of the
underlying filesystem your reading against (when not cached). Let's call
that the Optimal case. One way to do this might be to do a parallel hdfs fs
-cat "file" > /dev/null.

The second question kernel, user and io wait time per workload. So we could
get a snapshot something like this.

| Reader | Transfer Rate | Kernel | User | IO |
  Drill 1.7
  Other
  Solo
  Optimal

If the specific kernel and user times are too difficult (mostly in the 1.7
and other cases probably), maybe just io wait and cpu load and total test
duration for a fixed workload for each would suffice?

Even if this isn't possible, that's lots of great stuff in what you put
together. Was just trying to understand the bounding box.

thanks,
Jacques

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Jul 25, 2016 at 3:17 PM, Parth Chandra <[email protected]>
wrote:

> Didn't quite catch your question there. But I do have the following numbers
> from the file system -
>
>                                    | AvgIOR OpSize (KB) | Estimated
> Ops/Disk
> Drill 1.7.0 - uncached |                239              |
> 103
> Solo Uncached           |               240              |
> 281
>
> The numbers are approximate as these are captured by scripts on all the
> nodes and then averaged by another script.
>
> Solo is close to as fast as is possible from disk.
>
> Is that what you were looking for?
>

Re: [jira] [Commented] (DRILL-4800) Improve parquet reader performance

Reply via email to