So, I’ve gotten much of this running and I’m now down to refinement.  I’m 
pulling documents off of some RSS feeds, enhancing them and storing them in the 
Content Hub.  The Content hub portion will need to change for production, but 
for now it’s good for a POC.  I’m getting pretty good tag results out of the 
enhancement engines aside from some strange things, for instance some countries 
are seen as US cities.  

To get around this I’m programmatically filtering my results using the 
confidence value.  Is there a way to configure the enhancement chain to do this 
filtering for me?  With a Rule possibly?  I feel like it can do it, but I can’t 
figure out how.  Any thoughts?

Thanks,
Mark
On Mar 1, 2014, at 9:34 AM, Mark Loper <mark.lo...@cyrenllc.com> wrote:

> Thanks Rafa!  I’m starting some Java integration now trying to put the data 
> to some intelligent use.  I’m glad to hear that the entire stack works off 
> line as built.  That’s very helpful.  I’ll bring up more questions as they 
> come, I just got my environment set up for this, so I have a ways to go 
> before I have a good set of questions.  
> 
> 
> On Feb 26, 2014, at 11:01 AM, Rafa Haro <rh...@apache.org> wrote:
> 
>> Hi Mark,
>> 
>> El 26/02/14 15:57, Mark Loper escribió:
>>> Hi, this is going to take some clarification so you understand what I’m 
>>> trying to accomplish.  Bare with me.
>>> 
>>> I’m looking at Stanbol as a solution to some requirements that I have 
>>> received from my customer.  They want dynamic and intelligent content with 
>>> a social feel across all of their content DBs.  I’m looking at around 3PB 
>>> of data stored in various databases ranging from geospatial imagery, 
>>> documents, images, and video.  I want to show the concept of semantic web 
>>> can help them see, view and find their data faster and more intelligently.  
>>> I want to be able to feed documents into Stanbol, get a tag cloud based on 
>>> information in the object and find out what objects are most related based 
>>> upon a the relationships that are found over time.
>> Glad to see you are considering Stanbol for such interesting use case :-)
>>> 
>>>  My background is as a developer mainly dealing with geospatial image 
>>> processing, large object delivery over low coms, security, and OGC systems 
>>> like ESRI and OpenGeo.  CMS, and semantic web is proving to be a large area 
>>> of study that I’m trying to get up to speed with.  I did however get 
>>> Stanbol up and running quickly and very easily and have run a few documents 
>>> through the default config and have been happy with the results.
>>> 
>>>  Here is a basic list of what I want to have happen, and I’m having trouble 
>>> finding a use case that describes anything close.
>> Let me try to put some light around these requirements and let's wait for 
>> more suggestions from the community
>>> 
>>> 1.  I can’t store the data, it needs to reside at the origin servers.  so 
>>> I’m “enhancing” links to objects/ metadata.
>> As long as you can gather that content and send it to Stanbol's Enhancer API 
>> using one of the supported media types, that is not a problem
>>> 2.  I don’t have internet access, so this needs to live on a closed 
>>> network.  I could load my own copies of things like DB-pedia, maybe.
>> That is exactly the way Stanbol works out of the box, with local sites, 
>> although Stanbol can also use remote sites. For instance, as you might know, 
>> a 43K entities DBpedia site is created by default.
>>> 3.  I want to grow the intelligence, not start out with everything.  The 
>>> customer is most interested in what is recent, not what is 15 years old, so 
>>> I don’t need to consume all their data.
>> Initially not relevant from the technical point of view.
>>> 4.  I want to push every document a user looks at through the system, and 
>>> then over a short amount of time I expect I will have a decent and growing 
>>> library of connections between what is current and important.
>> Currently, Stanbol doesn't provide services for making sense of the 
>> extracted metadata using the enhancer. So that is something you would have 
>> to build by yourself.
>>> 5.  When looking at an image or video, I’d like the user to be able to tag 
>>> that object and based on that tag add that to the enhancements of that and 
>>> other objects in the system.
>> Currently, there aren't engines to enhance images or videos, although this 
>> functionality is in the backlog and for example has been proposed as a 
>> possible GSoC project for this year. So you would have to manually tag that 
>> content.
>>> 6.  I want to display a tag cloud, and/or list of related documents based 
>>> on what stanbol knows.
>> That could be easily achieved in a custom backend storing the enhancements 
>> and it is also possible in a way in Stanbol storing them in a Clerezza graph.
>>> 
>>> 
>>> I’m not looking for a solution to all this, I realize that much of it is 
>>> custom, but I feel that the Stanbol services are key to the picture.  I 
>>> can’t find a good example of how this would all fit together, and I don’t 
>>> think I have the semantic / CMS knowledge to just plow forward.  I am 
>>> looking to have a conversation that will get me moving.  Any help you can 
>>> provide would be very appreciated.
>>> 
>>> Thank you,
>>> 
>>> Mark Loper
>>> CTO
>>> Cyren LLC
>>> mark.lo...@cyrenllc.com
>> Hope that helps. Cheers,
>> Rafa
> 

Reply via email to