Hi All I'm working on a project in which I use the current LDA implementation that has been contributed by Databricks' Joseph Bradley et al. for the recent 1.3.0 release (thanks guys!). While this is great, my project requires several levels of topics, as I would like to offer users to drill down into subtopics.
As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would offer such a hierarchy. Looking at the papers and talks by Blei [1,2] and Jordan [3], I think I should be able to implement HLDA in Spark using the Nested Chinese Restaurant Process (NCRP). However, as I have some time constraints, I'm not sure if I will have the time to do it 'the proper way'. In any case, I wanted to quickly ask around if anybody is already working on this or on some other form of a hierarchical topic model. Maybe I could contribute to these efforts instead of starting from scratch. Best, Lorenz [1] http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf [2] http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf [3] https://www.youtube.com/watch?v=PxgW3lOrj60