[ 
https://issues.apache.org/jira/browse/GOBBLIN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Sen updated GOBBLIN-1399:
-----------------------------
    Summary: provide a way to specify writer path and file name format via 
config  (was: provide a way to speficy writer path and file name format via 
config)

> provide a way to specify writer path and file name format via config
> --------------------------------------------------------------------
>
>                 Key: GOBBLIN-1399
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1399
>             Project: Apache Gobblin
>          Issue Type: New Feature
>          Components: gobblin-api
>    Affects Versions: 0.15.0
>            Reporter: Jay Sen
>            Assignee: Hung Tran
>            Priority: Major
>             Fix For: 0.16.0
>
>
> currently gobblin has hard coded specification for writer's path and file name
> primarily it has namespace and tablename and default - 3 way to have writer's 
> path and file name.
> {code:java}
> // code placeholder
> switch (getWriterFilePathType(state)) {
>   case NAMESPACE_TABLE:
>     // writer.file.path.format = <extract.namespace>/<extract.table.name>/
>     return getNamespaceTableWriterFilePath(state);
>   case TABLENAME:
>     // <extract.table.name>
>     return WriterUtils.getTableNameWriterFilePath(state);
>   default:
>     return WriterUtils.getDefaultWriterFilePath(state, numBranches, branchId);
> }
> {code}
>  
>  Filename:
> {code:java}
> namespace.replaceAll("\\.", "/") + "/" + table + "/" + extractId + "_"
>     + (isFull ? "full" : "append");
> {code}
>  
> There is no way user can add any other parameters like version, batchId.
> Also it would be awesome to have any configuration value to be part of the 
> writer path, which can be defined by the format like this
>  
> {code}
> Unable to find source-code formatter for language: java. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yamlextract.type = increments
> writer.file.path.format="<extract.table.name>/<extract.extract.id>/<extract.type>"
> writer.file.name.format="part.<writer_id>_batch_<dataset.batch_id>.<branch_id>.<format_extension>"
> {code}
> Notice the values (like "dataset.batch_id" comes from the runtime config( 
> state.getProp() ), so it allows you to have any kind of flexible path and 
> file name based on your use-case.
> This will be enabled by the feature flag, so existing functionality can 
> remains the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to