Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
Techniques) detail itemset mining and the fpgrowth alg.  Han is a
co-inventor of it.

There is a bit of repetition in the output compared to other itemset
mining packages, though this structure is convenient for relational
indexing by key.

- Neal

On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[email protected]> wrote:
> Ok.. A bit more background..
>
> An Itemset is a subset I1, I2, I3... In
>
> so [I2, I4, I7] is an itemset and the support(no of times its visible in the
> dataset) is say Y
>
> A Pattern is Pair<Itemset, support>
>
> Take a look at in this format
>
> 68:
>     ([68],90692),
>     ([17, 68],90683),
>     ([12, 68],90490),
>     ([17, 12, 68],90481),
>     ([18, 68],90291)
>
> these are top patterns containing 68 and their support in descending order
> 68 occurs with 12,  90490 times
>
> Robin
>
>
> On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[email protected]>wrote:
>
>>
>> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
>>
>> > Each key is a feature and each attribute is the topK frequent patterns
>> where
>> > the feature exist
>>
>> Still a bit confused.
>> Given:
>> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12,
>> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229),
>> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31,
>> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31,
>> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16,
>> 68],87933),
>>
>> So, 68 is the feature in question.  That makes sense.  Then, what is the
>> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.  Why
>> all the repetition?
>>
>> -Grant
>

Reply via email to