[GitHub] [incubator-doris] spaces-X opened a new pull request, #9436: [Spark Load]fix min_value of GlobalDict will be negative number in spark load

GitBox Sat, 07 May 2022 00:22:10 -0700


spaces-X opened a new pull request, #9436:
URL: https://github.com/apache/incubator-doris/pull/9436


   
   # Proposed changes
   
   The `row_number()` function in **spark** returns **an integer type value**.
   
   
   
   It will cause two problems in Spark Load.
   
   **case 1:  loading a large amount of data at one time causes `row_number()` 
overflow.** 
   
   When the cardinality of the columns to be encoded in the **data imported at 
one time** is more than **2.1 billion**, `row_number` will return a negative 
number.
   
   
   
   **case 2:  loading data by many times causes the maximum dict_value  in the 
global dictionary to exceed Integer, but we do not cast it to bigint.**
   
   ---
   
   For case 1, I think it's a design flaw that causes a bottleneck on one-time 
loading and case 1 has relatively few scenarios, which can be solved by 
importing in multiple batches in the short term.
   
   For case 2,  it will be solved by this pr.
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] spaces-X opened a new pull request, #9436: [Spark Load]fix min_value of GlobalDict will be negative number in spark load

Reply via email to