[
https://issues.apache.org/jira/browse/STANBOL-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viktor Gal updated STANBOL-1214:
--------------------------------
Description:
the format of freebase dump has been changed. now they contain full URIs hence
the fbranking.sh for counting incoming links is obsolete. Here's a quick fix
for the new dump format:
gunzip -c db/freebase-rdf-2013-11-17.gz \
| grep
"^<http://rdf.freebase.com/ns/m\..*<.*>.*<http://rdf.freebase.com/ns/m\." \
| cut -f 3 \
| sed 's/.*\/ns\/\(.*\)>/\1/g \
| sort -S $MAX_SORT_MEM \
| uniq -c \
| sort -nr -S $MAX_SORT_MEM > $INCOMING_FILE
> Fix for fbranking.sh script
> ---------------------------
>
> Key: STANBOL-1214
> URL: https://issues.apache.org/jira/browse/STANBOL-1214
> Project: Stanbol
> Issue Type: Bug
> Reporter: Viktor Gal
>
> the format of freebase dump has been changed. now they contain full URIs
> hence the fbranking.sh for counting incoming links is obsolete. Here's a
> quick fix for the new dump format:
> gunzip -c db/freebase-rdf-2013-11-17.gz \
> | grep
> "^<http://rdf.freebase.com/ns/m\..*<.*>.*<http://rdf.freebase.com/ns/m\." \
> | cut -f 3 \
> | sed 's/.*\/ns\/\(.*\)>/\1/g \
> | sort -S $MAX_SORT_MEM \
> | uniq -c \
> | sort -nr -S $MAX_SORT_MEM > $INCOMING_FILE
--
This message was sent by Atlassian JIRA
(v6.1#6144)