Re: [dev] sfeed: a simple RSS and Atom parser and reader

Hiltjo Posthuma Mon, 06 Aug 2012 03:20:54 -0700

On Mon, Aug 6, 2012 at 12:59 AM, pancake <panc...@youterm.com> wrote:
>
> Did you tried with parsifal? Anyway.. My parser was simpler than all that 
> xml-strict foo. So it worked too with corrupted and partially downloaded rss 
> files.
>
> http://hg.youterm.com/mksend/file/14984ebd1529/parsifal
>


I'll investigate this, thanks! I agree the downside of the current XML
parser I use is that it's a validating XML parser, meaning the XML
should be correctly formatted. I will replace it with a non-validating
parser at some point though.

>> I like to use curl because it handles https, http redirection and also
>> allows me to pass the date of the latest update so HTTP caching will
>> work too. But curl can easily be replaced by wget or fetch though.
>
> I end up using wget and using local files with rss2html to process them. 
> Depending on a library for this is imho not suckless

I agree, it doesn't depend on libcurl, just the command-line curl. You
can easily replace this with wget or fetch like I said.

>>
>>> Actually, the only useful feature was the 'planet' option which 
>>> sorts/merges all your feeds in a single timeline.
>> You can specify multiple feeds in a config file and run sfeed_update
>> with this config file as a parameter. Then pipe it through sfeed_html
>> .
>
> Config file for what? Specifying a list of feeds should not be in a config 
> file. Maybe in a wrapper script or so.

I agree. Sfeed_update is an optional wrapper script, it's the script I
use and added for convenience. You can write your own wrapper scripts
around sfeed (I know some people here prefer rc over sh for example).

>
> Iirc A suckless way should be exporting a tsv where the first word of each 
> line is the unix timestamp, so using sort -n should be more unix friendly.
>
> At the end a feed reading should just comvert from various crappy atom/rss 
> formats to an unified tsv output. The rest can be done with grep, sort and 
> awk. Even the html output
>

I somewhat agree and this is what sfeed does. The optional
sfeed_update wrapper script does a little more than that though. It
makes sure there are no duplicates, groups them by feedname etc, a
snippet from the sfeed_update script:

        # merge raw files.
        # merge(oldfile, newfile)
        merge() {
                # unique sort by id, link, title.
                # order by feedname (asc), feedurl (asc) and timestamp (desc).
                (cat "$1" "$2" 2> /dev/null) |
                        sort -t '       ' -u -k7,7 -k4,4 -k3,3 |
                        sort -t '       ' -k10,10 -k11,11 -k1r,1
        }

> I would suggest exporting json too. That will make templating work on client 
> side and no need to do any templating system. Static html is good for lynx... 
> Another option i would suggest is to put that template design in config.h

You can convert the tsv format to json, it should be very trivial.

> Can you specify filter for words? Grep will work here?

Definitely, you can grep -v the tsv feeds file or just the stdout of sfeed.

>>
>>> I also wanted to have a way to keep synced my already read links. But that 
>>> was a boring task.
>>
>> Atm I just mark all items a day old or newer as new in sfeed_html and
>> sfeed_plain. In your browser visited links will ofcourse be coloured
>> differently.
>>
>
> The workflow i would like to have with feeds is:
>
> Fetch list of new stuff
> Mark them as:
>  - uninteresting (stroke, possibly add new filtering rules)
>  - read later (have a separate list of urls to read when i have time)
>  - mark as read/unread.
>  - favorite (flag as imprtant thing)
>  - show/hide all news from a single feed
>
> I understand that this workflow shouldnt be handled by sfeed, because thats a 
> frontend issue. But having html output does not allows me to do anything of 
> that.

Sounds useful and all that should be possible but you need to write
additional scripts for that.

> With json it would be easy to write a frontend like that easily in javascript 
> ( blame me, but its fast and its everywhere). There's also a minimalist json 
> parser named js0n that can do that from commandline too.
>
> But probably people in this list would expect an awk friendly format instead 
> of json. (tsv can be easily converted to json)
>

I opted for an awk friendly format, but I personally think json is a
good format for data exchange (much better than XML).

I hope that will answer you questions, feel free to contact me on IRC
for any questions too for a faster answer :)

Re: [dev] sfeed: a simple RSS and Atom parser and reader

Reply via email to