[
https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982495#comment-13982495
]
Suneel Marthi commented on MAHOUT-1236:
---------------------------------------
[~ssc] IMO we definitely need something along these lines (especially for
clustering wherein we need to store a cluster name + other properties. We had
seen this issue with KMeans clustering for 0.9 while working on fix for M-1030
and we ended up converting everything into Named Vectors while serializing the
clusters. (Definitely wasn't the most efficient way to do it).
Not sure if this can be part of the new Scala DSL effort or do we just close
this and reopen another one if someone steps up to work on this?
> Need a cleaned up serialized format for Vectors to handle names and all other
> kinds of things
> ---------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1236
> URL: https://issues.apache.org/jira/browse/MAHOUT-1236
> Project: Mahout
> Issue Type: Bug
> Reporter: Ted Dunning
> Fix For: 1.0
>
>
> Our current serialization is subject several ills
> a) it breaks alignment by having a 1 byte flag field (evil, generic)
> b) it doesn't handle any kind of extensible format like protobufs so it isn't
> future-proof
> c) it doesn't handle named vectors very well
> d) it totally breaks with any other kind of decoration as with Centroids or
> WeightedVector or ... (see b)
> I propose that we use the current tag byte on the current serialization with
> a new flag bit that indicates that the vector will use a protobuf encoding.
> Then 3 bytes will be skipped to restore alignment. Then there will be a
> protobuf encoding for the vector.
--
This message was sent by Atlassian JIRA
(v6.2#6252)