You could just filter the input for sets containing the desired item,
and discard the rest. That doesn't mean all of the item sets have that
item, and you'd still have to filter, but may be much faster to
compute.
Increasing min support might generally have the effect of smaller
rules, though it doesn't impose a cap. That could help perf, if that's
what you're trying to improve.
I don't know if it's worth new params in the implementation, maybe. I
think there would have to be an argument this generalizes.

On Sat, May 2, 2020 at 3:13 AM Aditya Addepalli <dyex...@gmail.com> wrote:
>
> Hi Everyone,
>
> I was wondering if we could make any enhancements to the FP-Growth algorithm 
> in spark/pyspark.
>
> Many times I am looking for a rule for a particular consequent, so I don't 
> need the rules for all the other consequents. I know I can filter the rules 
> to get the desired output, but if I could input this in the algorithm itself, 
> the execution time would reduce drastically.
>
> Also, sometimes I want the rules to be small, maybe of length 5-6. Again, I 
> can filter on length but I was wondering if we could take this as input into 
> the algo. Given the Depth first nature of FP-Growth, I am not sure that is 
> feasible.
>
>  I am willing to work on these suggestions, if someone thinks they are 
> feasible. Thanks to the dev team for all the hard work!
>
> Regards,
> Aditya Addepalli

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to