Hi Stefano, On Jul 17, 2007, at 8:27 PM, Stefano Mazzocchi wrote:
> Mark Diggory wrote: >> Hello Simile, >> >> I'm hunting for any resources on SPARQL/RDF driven reporting >> engines. We're reviewing possible solutions for reporting on top of >> [EMAIL PROTECTED] and given we're very bent on getting RDF usage more main- >> stream, we are interested in something that would be very flexible >> and allow various sources to Query and return result sets that can be >> processed in something like JSP/Velocity/Java/Whatever to produce >> canned reports we design against the following types of Data sources. >> >> Apache/Log4j Logs >> DSpace Relation Databases (Postgresql) >> DSpace Object Model (Java) >> Metadata, Policy and History RDF triple-stores (Sesame/Java/SPARQL/ >> Longwell) >> >> I've been exploring some of the Simile tools, especially "Referee", >> with the inital interest of getting Apache Log data into a triple- >> store and available for generating reports against. > > That's not what Referee is about, btw. Referee is not a way to > transform > apache logs into RDF, but it's a way to mine referrer logs out of > apache > logs and find out who links to you and provide a little metadata about > that. *that* metadata is then dumped out as RDF, not the logs. Yes, I understand what the functionality in Referee is and "is not". I've fired it up in Eclipse, tested its output against our Apache Logs and reviewed the codebase. Still, your work shows how to efficiently parse the logs and "mine" them for content, which is my interest. I was also considering that the RDF that one would generate would have a different "spo" structure more appropriate for combining it with other data about the Web servers resources (I.E. joining the log information with the Community/Collection/Item/Bitstream database content rather than the ). >> While tools for >> processing Apache logs and gathering statistics do exist, it might be >> of greater interest to get such data into a common reporting >> framework with other data sources. My initial ideas were originally >> based on Relational tools like Crystal Reports/Jasper Reports. I'm >> seeking any information on hybrid/common solutions that might span >> sources of various format/protocol and one platform of interest to me >> at the moment is BIRT: >> >> http://www.eclipse.org/birt/phoenix/intro/ >> >> The logic being that RDF/SPARQL data-sources could be created/adapted >> to the framework which already has its own report generation tooling. >> One could go directly from Apache Logs into BIRT, but that would be >> much less RDF centric and we would still need to explore access to >> our triple-stores used in our Policy and History subsystems. >> >> Any recommendations or suggestions would be received with much >> gratitude. > > Without knowing what your reporting requirements are, there is no > much I > can recommend. Even the definition of reporting is a little fuzzy, I'm > afraid. I was being purposefully vague to see what would come back from the community at large ( tools/research I may not be privy to at this time). But more specifically, the typical definition of a SQL centric report generated against an SQL database like those one would acquire from SQL Server, Pentaho, BIRT, Crystal Reprots etc... > I've seen BIRT, which seems to me one of those things that appeal to > management more than to developers, as I never understood, really, the > different between a web report and a web age generated out of one or > more database queries... I guess reporting tools appeal to those who > can't create a database-driven web page on their own. Think outside the box. I suppose it'd be great if we lived in a world where children were taught programming languages in grade school and everyone could be a software developer, but thats not the reality. Its unrealistic to expect folks to do their jobs efficiently and productively if they have to invent everything from scratch. We need a solution for users of our systems to generate reports and not require a software engineer for such a remedial task... BIRT is just an Open source example of the the type of tool that might fill a need I'd like to make available. Its not a perfect fit (just as your Referee code isn't). But, this is an exploration of available tools I'm currently completing and its need is "two-fold": 1.) I'd like not to have to "program" reports for my users (our Operations team) but give them tools to easily do it without a Computer Science degree. Sometimes, if you give the user a "blank slate" and require them to customize, compile and deploy an application its too much, and they get overwhelmed (and with good reason). > But really, what is a reporting engine for you? Nothing prevents you, > right now, to take your data, RDFize-it, dump it into a triple > store of > your choice and then run sparql queries on top, obtain an XML > representation and XSLT transform it to anything you want. 2.) I'd rather not reinvent a "wheel" I have to bug fix, maintain and document (and which doesn't align with existing best practices and tools already out there). > Since I doubt your reports need to be 'fast' in being generated, this > can work just fine for you and can well integrate into dspace's future > cocoon-based XML pipelined frontend. Nor do they change dramatically over time... But rendering/generation is not such a big issue and can be done anywhere via any technology we choose... capability/functionality to join disparate sources of data is the more important requirement. > But if what you're looking for is an IDE to graphically construct your > sparql query, or a visual drag/drop interface to construct your report > as a portal output, no, I haven't seen anything like that nor I would > hold my breath for it. Thats too bad... It would make for a very powerful tool. thanks for the comments, Mark ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
