Hi Sean,

I understand your approach, but there's a slight problem.

If we generate rules after filtering for our desired consequent, we are
introducing some bias into our rules.
The confidence of the rules on the filtered input can be very high but this
may not be the case on the entire dataset.
Thus we can get biased rules which wrongly depict the patterns in the data.
This is why I think having a parameter to mention the consequent would help
greatly.

Reducing the support doesn't really work in my case simply because rules
for the consequents I am mining for occur very rarely in the data.
Sometimes this can be 1e-4 or 1e-5, so my minSupport has to be less than
that to capture the rules for that consequent.

Thanks for your reply. Let me know what you think.

Regards.
Aditya Addepalli




On Sat, 2 May, 2020, 9:13 pm Sean Owen, <sro...@gmail.com> wrote:

> You could just filter the input for sets containing the desired item,
> and discard the rest. That doesn't mean all of the item sets have that
> item, and you'd still have to filter, but may be much faster to
> compute.
> Increasing min support might generally have the effect of smaller
> rules, though it doesn't impose a cap. That could help perf, if that's
> what you're trying to improve.
> I don't know if it's worth new params in the implementation, maybe. I
> think there would have to be an argument this generalizes.
>
> On Sat, May 2, 2020 at 3:13 AM Aditya Addepalli <dyex...@gmail.com> wrote:
> >
> > Hi Everyone,
> >
> > I was wondering if we could make any enhancements to the FP-Growth
> algorithm in spark/pyspark.
> >
> > Many times I am looking for a rule for a particular consequent, so I
> don't need the rules for all the other consequents. I know I can filter the
> rules to get the desired output, but if I could input this in the algorithm
> itself, the execution time would reduce drastically.
> >
> > Also, sometimes I want the rules to be small, maybe of length 5-6.
> Again, I can filter on length but I was wondering if we could take this as
> input into the algo. Given the Depth first nature of FP-Growth, I am not
> sure that is feasible.
> >
> >  I am willing to work on these suggestions, if someone thinks they are
> feasible. Thanks to the dev team for all the hard work!
> >
> > Regards,
> > Aditya Addepalli
>

Reply via email to