llama90 commented on issue #38569:
URL: https://github.com/apache/arrow/issues/38569#issuecomment-1801998369

   Thank you for your response. cc  
   
   #### For string
   
   > BTW, why did you compare AsciiLower/AsciiUpper performance with multiple 
inputs? They are Arrow's compute kernels not Gandiva's functions. We are 
working on Gandiva's benchmark not Arrow's compute kernels, right?
   
   You are right. We are working on Ganvida's benchmark.
   
   In **summary**, it appears that:
   
   * We should focus on `Allocs` rather than `upper`.
   * The mix of uppercase and lowercase characters in string generation is 
_**not**_ significant.
   * Previously, strings of lowercase characters of varying lengths from 1 to 
64 were generated, but for benchmark consistency, we now maintain a length of 
64 when generating strings.
   
   <details><summary>previous generated string data</summary>
   
   ```
   [
       "bcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs",
       "tuvwxyzabc",
       "defghijklm",
       "nopqrstuvwxyzabcdefghijklmnopqrstuvwxy",
       "zabcdefghijklmnopqrstuvwx",
       "yzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc",
       "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij",
       "klmnopqrstu",
       "vwxyzabcdefghijklmnopqrstuvwxy",
       "z",
       ...
       "ijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs",
       "tuvwxyzabcdefgh",
       "ijklmnopqrstuvwxyzabcdefghijklmnopqrs",
       "tuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq",
       "rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq",
       "rstuvwxyzabcdefghijklmn",
       "opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklm",
       "nopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr",
       "stuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst",
       "uvwxyzabcdefghijklmnopqrstuvwxyzabc"
     ]
   ```
   
   </details>
   
   To briefly explain why we conducted the test, I've observed that the 
previous benchmarks selected characters within the `a`-`z` range to generate 
data. However, when using `random.h`, characters are selected from the `A` - 
`z` range, which led to questions when I saw differences in benchmark times.
   
   ```diff
   # previous
   - TimedTestAllocs/min_time:1.000                     140487 us       137767 
us           10
   - TimedTestOutputStringAllocs/min_time:1.000         228228 us       226211 
us            6
   # random.h
   + TimedTestAllocs/min_time:1.000                     243130 us       242760 
us            6
   + TimedTestOutputStringAllocs/min_time:1.000         332357 us       331799 
us            4
    ```
   
   Upon further inspection, particularly of the 
`arrow-compute-scalar-string-benchmark`, I noticed that `AsciiLower` and 
`AsciiUpper` show different performances depending on the range of characters 
in the string, which is why I mentioned the `upper` aspect.
   
   My understanding of Gandiva is limited, but I believe that mixing 
`uppercase` and `lowercase` could have a similar impact.
   
   #### For decimal
   
   In this case, I think you should check the `precision` and `scale` of the 
Decimal Type first. Was I searching for the wrong thing? I was understanding 
scale and precision in reverse.
   
   And for the existing decimal generation code, it was implemented like this
   
   
https://github.com/apache/arrow/blob/e62ec62e40b04b0bfce76d58369845d3aa96a419/cpp/src/gandiva/tests/generate_data.h#L82-L98
   
   Anyway, I'm having a hard time figuring out how to fit this in because for 
`random.h`, the `precision` and `scale` values are the same as before, but the 
random values generated are different.
   
   <details><summary>previous generated DecimalAdd2LeadingZeroes data</summary>
   
   ```
   [
       16351795034091788378275.759672,
       27854373977842001526319.734684,
       6061446293668872071695.356134,
       31215470323596391714672.813452,
       15533332405516061115471.066480,
       15401288582294600038721.016490,
       30274851179048759105303.625351,
       6564021121342245976114.232879,
       36171304665527520454124.155732,
       3546578283628563437121.241384,
       ...
       14860910085674466129132.927835,
       6131533904566936343671.748478,
       36528723632058120161272.517328,
       3964543116664404486145.446787,
       20606324915198138790682.388805,
       20631004666520402209109.905293,
       29878891319444493588791.347655,
       13823654056734867583462.044256,
       30786726283235031078880.608496,
       620129083913073530975.895978
     ]
   ```
   
   </details>
   
   <details><summary>`random.h` generated DecimalAdd2LeadingZeroes 
data</summary>
   
   ```
   [
     -10038291425340000002078631.818296,
     20434255526017535581463554076021.370910,
     45129537519062609815007557527197.169527,
     -62154291670684288853156612735793.477627,
     87103160282621257590199953623276.740180,
     79969063163511904595223170720469.422729,
     -86583636649126666385105002765466.142621,
     92352037140996988957666339677514.598197,
     25931328664987260071252581126967.305358,
     91215577756571866841817287196691.182509,
     ...
     52281930080182271842503590963377.447177,
     4372942953952971927671269281255.305966,
     96763102312104951696190306530776.653371,
     -18467695819003138567360094105422.436093,
     -9792209172207529275819289504154.021690,
     61852766762170189161854285754108.616433,
     25571897876364008995405425598827.463597,
     64701056156004809966065180476482.250553,
     -63257829389611554338056688929518.067776,
     67455720423980831654534847333775.750429
   ]
   ```
   
   </details>
   
   <details><summary>previous generated DecimalAdd2LeadingZeroesWithDiv 
data</summary>
   
   ```
   [
       7587380203.400195060290629070,
       26540763051.629767310192345708,
       7223529576.634111708006082596,
       6365242807.233231030439853103,
       30019804134.302409408324050740,
       25340645699.884287035225737289,
       10696019445.722928448197246826,
       949466737.960961850748143665,
       10100302893.597466582610415885,
       3997196768.260434038917129614,
       ...
       32227537378.843572088950643055,
       8537777136.877817787746240968,
       30551609561.954519025705399073,
       26680430508.348522923995981443,
       33569096800.981186984459717330,
       3351111436.781940463508363016,
       24558127179.325481340697631536,
       22734710467.783675740496138638,
       20732616951.383696862539113282,
       23868794600.064308290332457814
     ]
   ```
   
   </details>
   
   <details><summary>`random.h` generated DecimalAdd2LeadingZeroes 
data</summary>
   
   ```
   [
     14208508617760.000001420629929750,
     34494975892204083434.054431421823503934,
     -59808716875461356024.747039105565396418,
     -39719453848714764419.682550443910720831,
     -87317155083825605821.703637655099976619,
     -65044845346732651787.465211119366632301,
     -88009445715660277207.144934805781121915,
     95937137753543307790.043195390657972343,
     20590051896038797936.563280809743603093,
     49223527283939584300.218543938584200541,
     ...
     19342162338087482443.922956656008521477,
     -48849509529657253769.022422810864683008,
     -22694641636271599896.031597585758119284,
     -40556169347524762115.197985730587632297,
     12188092371682991218.322898910856250055,
     85478353248271990406.317595833539039857,
     75882328331451585423.885909771843576864,
     53687092121591423454.148705140146222481,
     -44309181005291397260.084537547533448259,
     -95059575541611513750.967476857639700305
   ]
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to