zhaorongsheng opened a new pull request, #62439:
URL: https://github.com/apache/doris/pull/62439

   ### What problem does this PR solve?
   
   Issue Number: N/A
   
   Problem Summary:
   When a Hive table is created with 
`com.hadoop.compression.lzo.LzoTextInputFormat`
   or `com.hadoop.mapred.DeprecatedLzoTextInputFormat` as the InputFormat, Doris
   Hive Catalog throws `NotSupportedException` and cannot query the table.
   
   Both InputFormats are provided by the hadoop-lzo library and produce standard
   LZO-compressed text files (`.lzo`), which Doris BE already supports via the
   existing `LzopDecompressor` and `TextReader`.
   
   Two FE-side fixes are applied:
   1. **HMSExternalTable.java** – Add both InputFormats to 
`SUPPORTED_HIVE_FILE_FORMATS` whitelist so the table passes format validation.
   2. **HiveUtil.java** – Mark any InputFormat whose class name contains 
`"lzo"` as non-splittable. LZO files have no global index and cannot be read 
from an arbitrary byte offset; sending a split with `start_offset > 0` to BE 
would cause decompression failure. This single check also covers any future LZO 
InputFormat variants.
   
   No BE changes are needed: `LzopDecompressor` and `TextReader` already handle 
`FORMAT_TEXT + LZOP` correctly.
   
   ### Root cause analysis
   
   | Layer | Problem |
   |---|---|
   | FE – whitelist | `LzoTextInputFormat` / `DeprecatedLzoTextInputFormat` not 
in `SUPPORTED_HIVE_FILE_FORMATS` → `NotSupportedException` |
   | FE – splittability | All whitelisted formats were marked splittable; LZO 
files must be non-splittable |
   | BE | Already supports LZOP decompression and TEXT format reading (no 
change needed) |
   
   ### Release note
   
   Hive Catalog now supports reading Hive tables that use 
`com.hadoop.compression.lzo.LzoTextInputFormat` or 
`com.hadoop.mapred.DeprecatedLzoTextInputFormat` as their InputFormat.
   
   ### Check List (For Author)
   
   - Test: Unit Test — added `HiveUtilTest` (6 cases) and extended 
`HMSExternalTableTest` (2 cases).
   - Behavior changed: Yes — Hive tables with LZO TextInputFormat that 
previously threw `NotSupportedException` can now be queried.
   - Does this need documentation: No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to