zmin1217 opened a new issue, #64:
URL: https://github.com/apache/datasketches-hive/issues/64

   On the TPC-H dataset,i use theta sketch to get intersect,Error of some 
results reaches 41%, but the doc say the default size(4096) about 3% error.
   
   spark.sql("create temporary function data2sketch as 
'org.apache.datasketches.hive.theta.DataToSketchUDAF'")
   spark.sql("create temporary function intersect as 
'org.apache.datasketches.hive.theta.IntersectSketchUDF'")
   spark.sql("create temporary function estimate as 
'org.apache.datasketches.hive.theta.EstimateSketchUDF'")
   
   scala> 
lineitem.select("l_suppkey").intersect(order.select("o_orderkey")).count
   res17: Long = 250000
   but theta sketch result is 145593, the error is 0.41
   
   scala> 
customer.select("c_custkey").intersect(lineitem.select("l_orderkey")).count
   res18: Long = 3750000
   but theta sketch result is 4404198, the error is 0.14


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to