JNSimba opened a new issue #8133:
URL: https://github.com/apache/incubator-doris/issues/8133


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   0.15.1-rc09
   
   ### What's Wrong?
   
   I want to ignore data with mismatched lengths during streamload, but I found 
that if there is only one data in the batch, `max_filter_ratio` will be invalid.
   Eg:
   1.create table sql
   ```sql
   CREATE TABLE `test_jieru1` (
     `id` int(11) NULL COMMENT "",
     `name` varchar(10) NULL COMMENT ""
   ) ENGINE=OLAP
   UNIQUE KEY(`id`)
   COMMENT "OLAP"
   DISTRIBUTED BY HASH(`id`) BUCKETS 1
   PROPERTIES (
   "replication_allocation" = "tag.location.default: 3",
   "in_memory" = "false",
   "storage_format" = "V2"
   );
   ```
   
   2.data in example.json
   `[{"id":"1","name":"zhangsanzhangsanzhangsan"}]`
   
   3.streamload option
   `curl -v --location-trusted -u root:123456 -H "format: json" -H 
"strip_outer_array: true" -H "jsonpaths: [\"$.id\", \"$.name\"]" -H 
"max_filter_ratio:1" -H "columns: id,name" -T example.json 
http://127.0.0.1:8030/api/test/test/_stream_load`
   
   respones:
   ```
   {
       "TxnId": 67452740,
       "Label": "711803f3-d44f-46c8-bc58-68f4ebb394d9",
       "Status": "Fail",
       "Message": "all partitions have no load data",
       "NumberTotalRows": 1,
       "NumberLoadedRows": 0,
       "NumberFilteredRows": 1,
       "NumberUnselectedRows": 0,
       "LoadBytes": 40,
       "LoadTimeMs": 6,
       "BeginTxnTimeMs": 0,
       "StreamLoadPutTimeMs": 0,
       "ReadDataTimeMs": 0,
       "WriteDataTimeMs": 4,
       "CommitAndPublishTimeMs": 0,
       "ErrorURL": 
"http://127.0.0.1:8040/api/_load_error_log?file=__shard_1005/error_log_insert_stmt_26496fe992d7e55c-409680f9eba491b4_26496fe992d7e55c_409680f9eba491b4";
   }
   
   errorURL detail is :
   Reason: the length of input is too long than schema. column_name: name; 
input_str: [zhangsanzhangsanzhangsan] schema length: 10; actual length: 24; . 
src line: []; 
   ```
   
   **When I turn the data into multiple pieces,it success**
   eg:
   `[{"id":"1","name":"zhangsanzhangsanzhangsan"},{"id":1,"name":"wangwu"}]`
   
   response is 
   ```
   {
       "TxnId": 67452049,
       "Label": "2fea62bd-eb35-4a98-ae64-c861820e135c",
       "Status": "Success",
       "Message": "OK",
       "NumberTotalRows": 2,
       "NumberLoadedRows": 1,
       "NumberFilteredRows": 1,
       "NumberUnselectedRows": 0,
       "LoadBytes": 65,
       "LoadTimeMs": 23,
       "BeginTxnTimeMs": 0,
       "StreamLoadPutTimeMs": 0,
       "ReadDataTimeMs": 0,
       "WriteDataTimeMs": 5,
       "CommitAndPublishTimeMs": 15,
       "ErrorURL": 
"http://127.0.0.1:8040/api/_load_error_log?file=__shard_945/error_log_insert_stmt_7440e508eae8659f-ce310b4ed5edf196_7440e508eae8659f_ce310b4ed5edf196";
   }
   
   ```
   
   ### What You Expected?
   
   `max_filter_ratio` also takes effect when there is a single data
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to