[
https://issues.apache.org/jira/browse/GOBBLIN-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Tiwari updated GOBBLIN-21:
-----------------------------------
Sprint: Apache Gobblin 170724, Apache Gobblin 170807 (was: Apache Gobblin
170724)
> Can Gobblin limit the number of output files
> --------------------------------------------
>
> Key: GOBBLIN-21
> URL: https://issues.apache.org/jira/browse/GOBBLIN-21
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Abhishek Tiwari
> Assignee: Aditya Sharma
>
> Hi,
> I use Gobblin to read data from Kafka, and create ORC files on HDFS. I notice
> that the number of generated ORC files is the same with the number of Kafka
> partitions. Is there any Gobblin configuration to control the number of
> generated output files?
> Thanks
>
> *Github Url* : https://github.com/linkedin/gobblin/issues/1894
> *Github Reporter* : *jeffwang66*
> *Github Created At* : 2017-05-23T17:54:40Z
> *Github Updated At* : 2017-05-23T22:35:46Z
> h3. Comments
> ----
> [~ibuenros] wrote on 2017-05-23T18:12:22Z : Hi,
> Try setting mr.job.max.mappers to the number of output files you want
> (note this is used by the Kafka source even in local mode).
> Best,
> Issac
> On Tue, May 23, 2017 at 10:54 AM, jeffwang66 <[email protected]>
> wrote:
> > Hi,
> >
> > I use Gobblin to read data from Kafka, and create ORC files on HDFS. I
> > notice that the number of generated ORC files is the same with the number
> > of Kafka partitions. Is there any Gobblin configuration to control the
> > number of generated output files?
> >
> > Thanks
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <https://github.com/linkedin/gobblin/issues/1894>, or mute the thread
> > <https://github.com/notifications/unsubscribe-auth/ABTQkJChJv4GCBxkZ15WA_c1g7C2rmXEks5r8x1hgaJpZM4NkDkE>
> > .
> >
>
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/1894#issuecomment-303486064
> ----
> *jeffwang66* wrote on 2017-05-23T22:35:46Z : @ibuenros
> Hey Issac:
> I tried this property, but still the number of output files is the same with
> Kafka partition.
> I am using Gobblin 0.8
> Thanks
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/1894#issuecomment-303552260
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)