[
https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633041#comment-14633041
]
dandantu commented on SPARK-8999:
---------------------------------
In real world, most of data is with time series or sequence relation, such as
web page visiting sequence, Signaling sequence data in the Telecom field,gene
sequence in Biology filed, temporal mobile trajectory data. In the fields of
Internet\Telecom\Biology,frequent sequential pattern mining algorithm
prefixSpan is used widely:
(1)In the Telecom field, there are two use cases. The first case is that we use
prefixSpan to extract frequent signaling sequence features to predict candidate
complaint customers.Sometimes,signaling data comes in the same time. So we
should support non-temporal sequence in PrefixSpan.
The second use case in the telecom field is that mining frequent visiting
POI(Point of Interest) sequences from users' mobile trajectory to optimize road
plan for ministry of traffic planning or popular tourist spots.
(2)In the Internet field,many portal websites and shopping websites use prefix
to mine frequent visiting web page sequences to optimize the content of web
pages to improve experience of users.
(3)In the field of Biology,prefixSpan is used to mining frequent gene
sequence.non-temporal sequences always exist in this use case.
> Support non-temporal sequence in PrefixSpan
> -------------------------------------------
>
> Key: SPARK-8999
> URL: https://issues.apache.org/jira/browse/SPARK-8999
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.5.0
> Reporter: Xiangrui Meng
> Priority: Critical
>
> In SPARK-6487, we assume that all items are ordered. However, we should
> support non-temporal sequences in PrefixSpan. This should be done before 1.5
> because it changes PrefixSpan APIs.
> We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1
> to mark itemset boundaries. The latter is more efficient for storage. If we
> support generic item type, we can use null.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]