Github user LosD commented on the pull request:
https://github.com/apache/metamodel/pull/36#issuecomment-126828916
In many (most?) cases, I. believe. it. is. pretty. inefficient. to. have.
to. close. the. stream. for. every. single. row. In HDFS it is crippling.
It's not a problem for normal File operations, as you can easily buffer
before writing a block, but unless I misunderstood @cludiaPHI's investigations,
we do not have that luxury in Metamodel, as rows come in one at a time, and
there's no way to know when it ends. We can of course do the instanceof hack in
the CSV writer, but do we want users of the Resource interface to be
handicapped by default?
I might have misunderstood something, but as far as I can see, HDFS is just
an extreme example of a general problem.
--
Med venlig hilsen
Dennis Du Krøger
On 1 August 2015 00:19:22 CEST, "Kasper Sørensen"
<[email protected]> wrote:
>A few thoughts/notes:
>
>I think actually the CSV update callback has some FileResource specific
>hacks to avoid always appending again and again. Might be you find the
>better root cause here!
>
>To LosD: Resource is not itself closeable just like File is not. But
>the streams that you might get by invoking methods on the Resource is
>obviously closeable.
>
>---
>Reply to this email directly or view it on GitHub:
>https://github.com/apache/metamodel/pull/36#issuecomment-126826011
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---