[
https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lvhu reassigned HIVE-27590:
---------------------------
Assignee: (was: lvhu)
> Make LINES TERMINATED BY work when creating table
> -------------------------------------------------
>
> Key: HIVE-27590
> URL: https://issues.apache.org/jira/browse/HIVE-27590
> Project: Hive
> Issue Type: Improvement
> Components: Hive, SQL
> Affects Versions: 3.1.3
> Environment: {code:java}
> //代码占位符
> {code}
> Reporter: lvhu
> Priority: Major
>
> *The only way to set line delimiters when creating tables in the current hive
> is like this:*
> {code:java}
> package abc.hive.MyFstTextInputFormat
> public class MyFstTextInputFormat extends FileInputFormat<LongWritable, Text>
> implements JobConfigurable {
> ...
> }
> create table test (
> id string,
> name string
> )
> INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code}
> If there are multiple different record delimiters, multiple TextInputFormats
> need to be rewritten.
> Unluckily, The ideal method is not supported yet:
> {code:java}
> create table test (
> id string,
> name string
> )
> row format delimited fields terminated by '\t' -- supported
> LINES TERMINATED BY '|@|' ; -- not supported {code}
> I have a solution that supports setting line delimiters when creating tables
> just like above.
> *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.*
> HiveTextInputFormat class read <pathToDelimiter> file to support setting
> record delimiter for input files based on the prefix of the file path.
> {code:java}
> public class HiveTextInputFormat extends FileInputFormat<LongWritable, Text>
> implements JobConfigurable {
> ....
> public RecordReader<LongWritable, Text> getRecordReader(
> InputSplit genericSplit, JobConf
> job,
> Reporter reporter)
> throws IOException {
>
> reporter.setStatus(genericSplit.toString());
> // default delimiter
> String delimiter = job.get("textinputformat.record.delimiter");
> //Obtain the path of the file
> String filePath = genericSplit.getPath().toUri().getPath();
> //Obtain a list of file paths and delimiter relationships by parsing the
> <pathToDelimiter> file
> Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the
> <pathToDelimiter> file
> for(Map.Entry<String, String> entry: pathToDelimiterMap.entrySet()){
> //config path
> String configPath = entry.getKey();
> //if configPath is the prefix of filePath, set delimiter corresponding
> to the file path
> if(filePath.startsWith(configPath)) delimiter = entry.getValue();
>
> }
> byte[] recordDelimiterBytes = null;
> if (null != delimiter) {
> recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
> }
> return new LineRecordReader(job, (FileSplit) genericSplit,
> recordDelimiterBytes);
> }
> } {code}
> *2. modify hive create table class to support <LINES TERMINATED BY>*
> {code:java}
> create table test (
> id string,
> name string
> )
> LINES TERMINATED BY '|@|' ;
> LOCATION hdfs_path; {code}
> If Users execute above SQL, hive will insert (hdfs_path,'|@|') to
> <pathToDelimiter> file.
> Set HiveTextInputFormat as default INPUTFORMAT .
> Looking forward to receiving your suggestions and feedback!
> *If you accept my idea, I hope you can assign the task to me. My Github
> account is: _lvhu-goodluck_*
> I really hope to contribute code to the community
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)