Hi Stefano,

On Jul 17, 2007, at 8:27 PM, Stefano Mazzocchi wrote:

> Mark Diggory wrote:
>> Hello Simile,
>>
>> I'm hunting for any resources on SPARQL/RDF driven reporting
>> engines.  We're reviewing possible solutions for reporting on top of
>> [EMAIL PROTECTED] and given we're very bent on getting RDF usage more main-
>> stream, we are interested in something that would be very flexible
>> and allow various sources to Query and return result sets that can be
>> processed in something like JSP/Velocity/Java/Whatever to produce
>> canned reports we design against the following types of Data sources.
>>
>> Apache/Log4j Logs
>> DSpace Relation Databases (Postgresql)
>> DSpace Object Model (Java)
>> Metadata, Policy and History RDF triple-stores (Sesame/Java/SPARQL/
>> Longwell)
>>
>> I've been exploring some of the Simile tools, especially "Referee",
>> with the inital interest of getting Apache Log data into a triple-
>> store and available for generating reports against.
>
> That's not what Referee is about, btw. Referee is not a way to  
> transform
> apache logs into RDF, but it's a way to mine referrer logs out of  
> apache
> logs and find out who links to you and provide a little metadata about
> that. *that* metadata is then dumped out as RDF, not the logs.

Yes, I understand what the functionality in Referee is and "is not".  
I've fired it up in Eclipse, tested its output against our Apache  
Logs and reviewed the codebase.  Still, your work shows how to  
efficiently parse the logs and "mine" them for content, which is my  
interest. I was also considering that the RDF that one would generate  
would have a different "spo" structure more appropriate for combining  
it with other data about the Web servers resources (I.E. joining the  
log information with the Community/Collection/Item/Bitstream database  
content rather than the ).


>> While tools for
>> processing Apache logs and gathering statistics do exist, it might be
>> of greater interest to get such data into a common reporting
>> framework with other data sources. My initial ideas were originally
>> based on Relational tools like Crystal Reports/Jasper Reports.  I'm
>> seeking any information on hybrid/common solutions that might span
>> sources of various format/protocol and one platform of interest to me
>> at the moment is BIRT:
>>
>> http://www.eclipse.org/birt/phoenix/intro/
>>
>> The logic being that RDF/SPARQL data-sources could be created/adapted
>> to the framework which already has its own report generation tooling.
>> One could go directly from Apache Logs into BIRT, but that would be
>> much less RDF centric and we would still need to explore access to
>> our triple-stores used in our Policy and History subsystems.
>>
>> Any recommendations or suggestions would be received with much
>> gratitude.
>
> Without knowing what your reporting requirements are, there is no  
> much I
> can recommend. Even the definition of reporting is a little fuzzy, I'm
> afraid.

I was being purposefully vague to see what would come back from the  
community at large ( tools/research I may not be privy to at this  
time). But more specifically, the typical definition of a SQL centric  
report generated against an SQL database like those one would acquire  
from SQL Server, Pentaho, BIRT, Crystal Reprots etc...

> I've seen BIRT, which seems to me one of those things that appeal to
> management more than to developers, as I never understood, really, the
> different between a web report and a web age generated out of one or
> more database queries... I guess reporting tools appeal to those who
> can't create a database-driven web page on their own.

Think outside the box.  I suppose it'd be great if we lived in a  
world where children were taught programming languages in grade  
school and everyone could be a software developer, but thats not the  
reality.  Its unrealistic to expect folks to do their jobs  
efficiently and productively if they have to invent everything from  
scratch.  We need a solution for users of our systems to generate  
reports and not require a software engineer for such a remedial task...

BIRT is just an Open source example of the the type of  tool that  
might fill a need I'd like to make available.  Its not a perfect fit  
(just as your Referee code isn't). But, this is an exploration of  
available tools I'm currently completing and its need is "two-fold":

1.) I'd like not to have to "program" reports for my users (our  
Operations team) but give them tools to easily do it without a  
Computer Science degree.  Sometimes, if you give the user a "blank  
slate" and require them to customize, compile and deploy an  
application its too much, and they get overwhelmed (and with good  
reason).

> But really, what is a reporting engine for you? Nothing prevents you,
> right now, to take your data, RDFize-it, dump it into a triple  
> store of
> your choice and then run sparql queries on top, obtain an XML
> representation and XSLT transform it to anything you want.

2.) I'd rather not reinvent a "wheel" I have to bug fix, maintain and  
document (and which doesn't align with existing best practices and  
tools already out there).

> Since I doubt your reports need to be 'fast' in being generated, this
> can work just fine for you and can well integrate into dspace's future
> cocoon-based XML pipelined frontend.

Nor do they change dramatically over time... But rendering/generation  
is not such a big issue and can be done anywhere via any technology  
we choose... capability/functionality to join disparate sources of  
data is the more important requirement.

> But if what you're looking for is an IDE to graphically construct your
> sparql query, or a visual drag/drop interface to construct your report
> as a portal output, no, I haven't seen anything like that nor I would
> hold my breath for it.

Thats too bad... It would make for a very powerful tool.

thanks for the comments,
Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology 
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Reply via email to